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SUMMARY 



The present study addresses itself to the problem of ■ deslgning an, 
automated system for irtptruction in puograniming, and also to the study 
of problera-solNTing beha^y.or, as exhibited by students using a CAI course 
in computer programming. \ 

The study uses comput\^r programs written by Uo college students 
during the winter and sprin^^ quarters of 1972 as part of a CAI course 
in AID (Algebraic Interpretive Dialogue), an algebraic language similaf 
to BASIC. The course is self-contained and consists of 50 tutorial' 
lessons described in detail in Friend (1973)- ^ . y 

The programs analyzed were- writton as solutions to 25 programming 
problems from the course; yU? solutions containing TO63 commands were 
analyzed. The distribution of, the data over problems and over students 
is discussed, . Problem difficulty and diversity of student solutions are 
also discussed in detail. 



CHAPTER I ■ * 

Introduction 

One of the major design problems in implementing computer^ssisted 
instruction ( CAI) 'coure^es in computer programming is that^o^ analyzing 
, student-written progra^is in real time. An instruct>otial program capab^ 
of providing response-sensitive, specific cl^rrective instruction for 
student programming efforts should have several attributes, including 
the ability to identify overt errors and to deliver unambiguous error 
messages, the ability to determine whether .'or not a etudent's pi;ogram 
is a correct solution, and the ability to determine from a partially 
written or nonfunctioning program the strategy preferred by the student 
and to give assistance on that basis. 

The first of these attributes— the ability to identify overt errors 
-bo deliver the appropriate mesBages--iB Usually seen as one of the 
func'MjDns of the compiler or interpreter. Syntax errors and a few kinds 
of Gem\bic- errors fall in this category. Overt semantic errors that 
are proble^ependent (oUch as the use of an incorrect algebraic formula) 
cannot be rele^ed to the Intexrreter; they must be handled by a routine 
that has access t^urriculum data. However, no 'perfect' algorithm for 
the detection of problem-dependent errors can be written, and an 'ideal 
programming consultant' of the type de/cribed above cannot be unambiguously 

defined. • ' 

Similarly, an algorithm that provides the second 'ideal' attribute, 
the ability to determine the correctness of a student's solution, does 
^not exist. It may be possible, however, to design a strategy that 
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usefully approaches this goal, for practical purposes, and it is one of 
the aims of this study to investigate how close such a .heuilstic solution 
might Qome^^/p * * , ■ 

As for the third desired attribute of an ideal programming consultant 
the problem of guiding a student along his chosen path — the problem it- 
self is poorly defined and the final chapters of this report (in which a 
study of the diversity of correct solutions is presented) attempt to 
provide a greater understanding of the problem and to present a research 
method and result^^-itStvjnay lead to a workable so>Lution/ 

jrhe present study addresses itself t^p Jihe problem of designing an . 
automated system for instructiorM.n^4>3^dgramming, and also to the study^ 
of probiem-oolving behavior, as exhibited by student© -using' a CAI course^ 
in computer programming. . ^ 

The study uses computer programs written by ^0 college students 
during the winter and spring quarters of 1972 as tjpart of a CAI course 
in AID (Algebraic Interpretive Dialogue), an algebraic language similar 
to BASIC. The course is .^elf-contained and consists of 50 tutorial 
lessons described in detail in Friend (l973a)* 

The AID course cunsista of two computer programs, one that presents 
instructional material to the student, and a second, the AID interpreter, 
which the student uses when writing and debugging his own AID programs. 
Th«? latter program was provided by Digital Equipment Corporation, manu- 

rer of the computer system on which the instructional system is, 
implemented. The interpreter was modified to allow for the' collection 
of student responses. 



The prdgraras analyzed were written as solutions to 25 programming 
prablems from the AlP course. Xhey were, chosen tq^ test programming 
ability an(J were expected to be among the most difficult problems dp 
the course— an expectation that was confirmed. Npt all studentp at- 
tempted every problem; la all, solut:?.ons containing 7063 programming 
commands were analyzed. 

The format and method of presentation are the same for all of the 
problems. After an introductory ino traction, a problem is stated in 

simple English by the instructional program. TJie student . types the 

J 

command 'AIDV to call the AID interpreter, and he^tRen attempts to write 
and d^bug a program to solve ^the given problem. During the time he uses 
the AID In/erpreter, he is interacting, with the computer as a profes- 
sioqal programmer would. His attempts are not monitored by the instruc- 
tional program, and he is free to use any programming devices he chooses 
(he may even write programs that are unrelated to the given problem).. 
The orAy tnstruction he receives while using the AID interpreter is in . 
the f^omi of error mesoages given by the Interpreter If he attempts to 
execute arr incorrectly fonned command . or program. The student may write 
and execute any syntactically correct AID commands and may delete or 
r^plac^ any command^'. . He may also file programs in disk storage or 
Recall previously filed programs. When he has completed the program to 
^his satisfaGtion--or has given up— he recalls the instructional program 
by typing the cdmmand 'INST'. 

r' 

/ The inst;ructional program^ \?hich does not have access to the 
student'^ program nor the ability to analyze it, attempts to determine 
whether the solution is correct by 'asking /or selected results obtained 



during execution of the program. If the student's report of the per- 

* 

•rformance df his' program indicates that the solution is correct, the 
lessons continuq$<?^f .not , additional in^struction is- given and the 

•student may be asked tOv^call the^ AID interpreter again for another trial. 
The sequence of instruction' is under the control of the student.^ 

.,He may work exercises in any sequence, skipping' some e^rci^es, returning 
to others for review^ etc. Hence, a student may encounter a problem 
several times, ' 

The amount of instruction varies, depending both upon student per- 
formance .and upon the desires^ of the student. Usually, after an incorrect 
response? to an item one short corrective message is given automatically. 
The- student may eall for additional instruction, in the form of 'hints', 
by typing a question mark. Home of the hints conjk^in explicit help^and 
occasionally even give a complete . correct solution to the .problem.- 
Because of the variation of the amount of instruction Intervening be- ^ 
tween the student 'n first trial at a given programming problem and 

• subsequent trials only the data collected during first trial have been 
analyzed in detail. ' - 

Hints are also available to the student before his first attempt 
and again, some of the hints are quite explicit. (The use of hirH« is 
discussed, with examples, in Chapter II.) For this reason, and becAise- 

'^students are fr^e to chart their own pathway through the' course, the" . 
amount of instruction- received before the first trial varies and affects 
the proportion of correct solutions, a fact that must be kept in min^ ' 
when interpreting the results reported here.. 



An illustration of a. student's interaction with the instructional 
program and the AID ipiterpreter follows (the problem is real," the "student^ 
hypothetical). In 4is example, And throughout this report, the ' problem » 



upper case as they appear on 
commonly used for the AID course. 



the 



0' 



and the student's' responses^ are ^ 
the M6del-33 Teletyro^the terminal most 

The asterisk- to <he left of the studant't; resjJon^ sonifies t 
student .that the- system is ready to receive input. The problem shovn 
below is taken from. Lesson 11 and is one of the problems used in the 
research reported here. 

WHITE A PROGRAM THAT WILL LIST THE. instructional prog.ram states 

RADIUS, DIAMETER, CIRCUMFERENC3i; , AND *the problem. 
AREA OF A CIRCLE OF RADIUS R. " THEN^ ^ 

UwSE THE' PROGRAM FOR R - 10; 20, 50, , ' - ' . 



UO, AND 50. 



K 9 



USE THESE FOJWIiAp (R STAND3 FOR 
RAQIUS) : ■ 

D ?^ K • , ' 

S< C --^2 * 3. lU 159265 * R 

A = 3.1^159265 R^2 
*AID 



•H-SET D r. 2)^R 




'■^The student imme^ately asks for 

a hint. . .. f/ ' 0 

, which is given by', the ixistruc- 
tional program. Formulas known by 
mot;t ntudentn are given in the 
hints rather 'than in the problem 
statements. ' 
The s'tudent then calls the^ AID 
Ip^terpreter. 

The student Wtartn to write a 
program ,but l<j:iar3vort.antl^ emits , 
the steR numbjsr. . . » 



R = ? 



...and receives an error message 



•^5.1 SE'^ D = 2-)^R 

^5. 3. SET A = 3-l^-^I>t2 
TYPE R, C, A , 



^DO PART 5 



1 



ERROR AT STEP 5.1: R = ? 
*r)G PART 5 FOR R =1 .' 
. ' R = 1 . • \ 

■ D = 2 

'1- 

•c = 6.28 

A = 12 , 56 ^ • 
*5,3.'-SET A = 3.ll|*Rt2 



*D0 PART 5 FOR R 

\ 

R= 10 f 



= 10(10)50 




indicatijfig that he attempted to 

xuse an S^aefined variable. 

^ '"-^ 'J- 

The student corrects the error and 
continues to write the program 
without syntax errors. 

The student tries to execute the 
program, but fails to specify a 
value for R. . . . " ' ' 

...and receives , an error message. 
The student correctly executes" 
the program.. 



The .student replace^ Step 5*3 to 
correct an algebraic error... 
...and then executes the program 
for the values of R specified in 
the problerfi. 



*IWST 

WHAT IS THE .CI]?CUMFEREIICE OF A 
CIRCLE WITH RADIUS 30?' 
*188.U ; 

CORE^CT 



After the program stop^, the student 
calls the instructional^ program. . . 
...which asks for a selected result. 



■t . ' 

The student reports j the result ^^^^^^^ 
obtained... | 
...arid is judged correct by^ the 
instructional program. The in- 
stiuctional program used 3.IU159265 
as an approximation for jr* rather 
than 3..IU as used by the student. ' 
For .this problem,, any response that ^ 
agrees with the coded correct answer 
to four significant digits is^ con- 
sldered correct. 



Had the student 's. response to the last question indicated that his pro- 
gram was not functioning" correctly, *he would have* been told to ask for 
as many hints as he needed and then t'o try again. The algebraic fontlulas 
would have been repeated in the hints, with the final hint giving a - . ^ 
correct solution tha^t the student could copy. . ^ ' . - . 

Had the above work ^qen do;;ie by a real student, all student input 
between the two commands 'AID' and 'INST' would have been ^stored as data 
by the AID interpreter. . (Othel' respenses, including the initial request 
* f or a hint,- would hpive been Collected by the instructional program, and 
are not analyzed in this report.) " Except for characters or lines erased ^ 
by the "student immediately afte^^ they' are typed, every character typed 



V 



by the |liP(i^t is recorded. In ai^ditipn, a small amount of bookkeeping 
inf(krtia^3lin--the student's identification, the problem number, arid the - 
date-' and, tlke4-ls stored with each data block. ^ 

The data gre presented in some detall;ln Chapter III, following the 
desc"iripti6n-of |the programming problems in Chapter II. The distribution 
of the d^ta- over problems and over students is discussed, ...summary sta- 
tistics of the m^jB^"^"^ AID command^' typed and the ni;imber of commands- 
executed are given, and'the^number o^ occurrences of different kinds' of , 

" ■ ■ - ■ i ' . . 

AID commands' Is reported and compared with the predicted -proportions of 

the kinds of commands^ - . ' / ° 

. In the. remaining chapters, two different kinds of analysis are 
presented. Discussion af^proliilem dijpifleulty occupies Chapters IV, V,- 
and VI. Chapter IV describes several methoda for determining the pro-- 
portion of correct and partially correct. solutions , and derives the 
Statistics that are used' later ■.in the development of foimulas for , 

/measuring problem difficulty; the distribution of correct solutions is 
also discussed. . Chapter V reports an analysis of overt errors. Errors 

" are^ classified as either syntactic or semantic, ^and each of these classes 
is further subdivided. The distribution of errors over students is shown, 
and" a, measure of error rates developed. Both the number of errors and 

^the error >ates are used in measuring problem difficulty. In Chapter Vl, 
19 measures of problem difficulty are defined, and the correlations 
between pairs of measures are 'given. Ten characteristics of the problem 
that might affect problem difficulty are discussed and measured. These \ 
ten characteristics^ were used as independent variables in stepwise 
multiple line-ar regressions from which linear formulas for predicting 

■ ' I ■ ^ \ ■ - ~ • 



problem difficulty were developed. The analysis of problem difficulty, |. 
reported is similar to that in Moloney (1972) of proofs written by stu.- 
dents in a CAl course in-logic. • - . ^, ' 

The second kind of analysis ia a s^tudy of Jhe diversity of student 
solutions, and in this I am indebted to Dr. Michael Kane (1972) for the 
methods that he developed for a similar analysis of prpofs produced by 
students in the. CAI logic course. The use of equivalence relations • - 
reported hete is adapted from Kane but the method of measuring diversity 
is somewhat different frpm Kane's method of measuring variability of 
student- written proof^^ 

/ ■ ■ • ■ • 

Equivalence of programs is discussed in Chapter VII. -Four defini- 
tlons of equivalence are given, and the 551 correct and nearly correct 
solutions' found in the data are classified by e.ach of the four definitions. 
In Chapter VIII, four measures of diversity of solutions are defined, and, 
the'effect on diversity of several characteristics of the problems and 
the curriculum is investigated.. Stepwise multiple linear jregressions 
were then used to develop linear fomula^s for the prediction of diversity. 
The final chapter summarized the findings of this study and discusses 
their implications. 
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CHAPTER II 



Description of Progi^amraing Problems 

/ ■ ■ . 



The 25 programming problisms used for this study are displayed below, 
together with expected coiirect solutions. The solutions given are yiose 
anticipated as the most likely correct solution to each problem.. (Th€ 
chapter closes with a discussion of why these solutions were chosen.) 
The programs written by students (Appendix A) are discussed in Chapters 
VII and VIII. 

Although comments on the . context in which each problem appeared are 
included, there are no e:q)lanations of the AID programs^. A description 
of the language is given in Friend (1973)- Each problem is : identified 
by the lesson number and. the problem used in tl:;ie course; the identifier 
L l6^h, for example, refers to Lesson I6, Problem h. The optional hints 
are shown following the problem statement. A student who asked ^r a 
hint received the first hint' listed. A second request brought the Second 
• hint, and so on until the hints were exhaucted. 



L 5-30 : 1 CENTIMETER rr. .3937 INCHES. CONVERT' THE FOLLOWING LENGTHS 
TO CENTIMETERS: 

6.9 INCHES 

7.hk3 INCHES 

23.9753 INCHES 



Hint #1: 

X CENTIMETERS = Y INCHES/. 3937 7 
Hint #2': 

TO CONVERT 5 INCHES TO CENTIMETERS, DIVIDE 5 BY -3937- 

Correct solution: 
SET K = .3937 

TYPE 6.9/K, 7.^^5/K, 23.9753/K 
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Technically, Problem L 5-30 is not a programming problem, because the 
concept of a scored program is introduced later," and the student is 
expected to use direct (immediately executed) commands to solve the 
problem. Only two AID commands, TYPE and SET, have been introduced by 
Lesson 5, and those only in their direct form. When the student 
solutions were graded for this stxldy,-a SET command was not required, 
since this was not requested in the' problem statement. 



L 8-9: USE A "LET" COMMAND TO DEFINE A FUNCTION THAT GIVES THE 
RECIPROCAL OF X. USE YOUR FUNCTION TO FIND THE RECIPROCAL OF 

- 67.3t3 . . ' 

6 + h . 



Hint #1: 

THE RECIPROCAL OF X IS 1/X. 

Hint #2: 

USE THE COMMAND 

LET R(X) = 1/X 
TO DEFINE THE -FUNCTION. 

Hinti #3: 

USE/THESE COMMANDS: 
I LET R(X) 1/X 
TYPE R(ll9.i+) 
TYPE R(67.3t3) 
TYPE R(6' ^ k) 

Correct solution: 
^ LET F(X) - 1/X 

TYPE F(119-^) 
\ TYPE F(67 0t3) 
\type, F(6 4 k) 



This A-oblem is mom nearly a 'programming problem' than the preceding 
one, slfice it requires the use of a stored formula as a specification 
of an algorithm. The use of the LET command was introduced in Lesson 8, 
end this^ is the second problem in which students were required to use 
such a command. Unless a student defined and used a function for this 
problem, Vils solution "was- consi-dered incorrect since the problem state- 
ment specified that a LET command was to be used. . 



L 8-27: DEFINE A "VOLUME" FUNCTION THAT WILL GIVE THE VOLUME OF A 
CYLINDRICAL TANK OF RADIUS R AND HEIGHT H. (VOLUME 3 -1^16 TIMES 
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THE RADIUS SQUARED TIMES THE HEIGHT.)/ 
FIND THE VOLUME OF 2 TANKS : / . 

.TANK 'A IS 57.5 FEET HIGH AND HA^ A RADIUS OF 18^.^^^^. 

TANK B IS 65.1+ FEET HIGH AND TIJB RADIUS IS,Jtfr3 FEET. ^ ^ 



Hint #1: 

USE THES COMMAND TO HEFTM-^r^ "VOlpffi" FUNCTION: 
] LET V(B, H) =.'5:l^l6*Ht2*H 

Hint #2: 

AFTER THE VOLUME FUNCTION IS TiEFlplD, iJSE THIS COMMAND TO FIND 
THE volume' of TANK A: 

TYPE V(l8.6, 57-5) 

Correct solution: 

-LET V(R,H) = 3.lUl6*Rt2*H 
• .TYPE v(l8.6,57.5), V.(19.3,6^.U) 

This is the first problem in whifcih a function of- two variables is used. 
Note that ttie formula for the volume of a right cylinder is given in 
the problem statement; formulas that are likely to be known by most"' 
students (e.g., area of a circle) are not ordinarily given in the 
problem, but are given in the optional hints., ^ ^ ' ", 
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L 8-28: DEFINE A FUNCTION TO CONVERT DEGREES FAHRENHEIT TO DEGREES 
CENTIGRADE. THEN CONVERT THESE TEMPERATURES TO CENTIGRADE: 
0, 10, 32, 72, 212 

• ^ 

Hint#l: 

TO CONVERT TO CENTIGRADE SUBTRACT 32 AND MULTIPLY BY 5/9- 

Hint #g; ' . 

DEFINE A "CONVERSION- TO CENTIGRADE" FUNCTION LIKE THIS: 

LET C(F) =■ 5/9x-(F - 3?) 
WHERE F STANDS FOR "DEGREES FAHRENHEIT." 

Correct solution: 

LET C(F) - y^x-CF - 32) ^ •« 

TYPE C(0), C(10), C(32), C(72), C(212l 

This problem is similar to L 8-9, which also required the use of a 
function of one variable, although here the formula is somewhat more 
complex. 
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L 9-3- WRITE A FUNCTION S< A, B) THAT WILL FIND THE LENGTH OF THE 
HYPOTENUSE OF A RIGHT TRIANGLE IF THE LENGTHS OF THE OTHER TWO SIDES 
ARE GIVEN BY A AND B. TEST YOUR FUNCTION ON THESE TRIANGLES : 

1. A =.3, B = It ■ 

2. A = 12, B' = 12 • . - 

3. A = 1/2, B = 3A 
U. A = 9, B = 13.2 

° 1^ • j ' , 

Hint #1: ' ■; 

THE HYPOTENUSE OF A RIGHT TRIANGLE IS EQUAL TO THE SQUARE ROOT OF 

THE SyM OF THE SQUARES OF THE OTHER TWO SIDES. 

Hint ^f2: ' 

HYPOljENUSE = SQUARE R00T( (SIDE A) t2 + (SIDE B)t2) 

Correct solution: ( ' 

LET H(A, B) = SQRT(At2 + Bt2) 

TYPE H(3, h), H(12, 12), E{l/2 , 3A), H(9, 13-2) 

The standard functions SGN, SQRT, IP, and FP were introduced in 
Lesson 9; this is the first programming problem that uses SQRT, and 
also the first problem that allows the student to define a function 
in terras of a standard function. However, since the problem did not 
specify using the function SQRT, solutions that used other algebraic 
formulations were also considered correct. 



L 9-8: WTtEW AN TNTEnEB M IS DIVIDED BY AN TNTEnER N, THERE IS A 
QUOTIENT AND A REMATIOER. ' WRITE A QUOTIENT FUNCTION Q(M, N). USE 
THE FiJNCTION TO FIND THE QUOTIENTS FOP THESE VAI,IJES OF M AND N: 
M ■• 917? N - 3B 
M - 13 N ■- 87 

f M . 76H N - 101 



M - 6U8O N - 1 



Hint #1: 

FOR EXAMPI^: lV3 HAS A QUOTIENT OF U AMD A REMAINDER OF 
Hint //?: 

Uv^E THE TP FUNCTION TO FIND THEl'OUOTIENT. 

Correct nolution: f 
' LET Q(M, N) - IP(M/N) \ 
T/PK QfQl'^^S^fl), Qfl3,B7), Qf|68,10l), 0,(6)480,15) - 

ThiG ir> the firrt problem requiring the uno of TP, the 'integer part' 
function. / 
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L :|0r-12: WRITE M INDIRECT STEP TEAT WILJ- CONVERT MILES PER HOUR TO 
FEEi?''PER SECOND. THEN CONVERT ALL OF THESE TO FEET PER SECOND: 

10 MILES PER HOUR 
j- 100 MILES PER HOUR 
' 65 MILES PER HOUR 

1023 MILES PER HOUR 

Hint : ^ ' 

IT ^j^mW FOR SPEED IN MltES PER HOUR, THIS COMMMD WILL GIVE THE 
Sl^^ET IK FEET PER SECOND. ' ■ 

I ■\ype 5^5280/(60^60) 

■ dorrect jBolution: u * 

S.l'^TYPE S^5280/( 60^60) 
V DO STEP 3.1 FOR S = 10, IO9, 65, 1023 ■ . 

'with the introduction of the concept of stored' commands and^ their 
execution in Lesson 10, the first true programming problem in the 
course is presented. 



L 10-19: WRITE AN INDIRECT STEP THAT WILL TYPE THE iSQOARE ROOT OF X. 
DO THE STEP FOR X = 1, 2, 3^ ^0. 




Hint : 

IN THE SEQUENCE 

1> 2, 3> ^ ' * ^ 
THE INITIAL VAL^E IS 1 
THE STEP SIZE IS 1 

THE FINAL" VALUE 13* 10 - / ' 

Correct Golution: 

5.i+ TYPE SQRT(X) • : . 

DO STEP 5.^ FOR X - l(I)lO " ^ . . ' 

.i> 

This problem is similar to the preceding one, which also requires that 
a single stored command be iterated several times*. The instruction 
intervening between* these two problems explains the use of the range 
specification in FOR modifiers, and it was anticipated that students 
would use that device in solving this problem. 
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L 11-11: WRETE A PROGRAM THAT WILL LIST THE RADI 
CIRCUMFERENCE, ANP AREA OF A CIRCLE X OF RADIUS R^ 
PROGRAM FOR R = 10, 20, 30, hO, AND 50. 




ER, 

USE THE 



Hint: 

USE THESE FORMULAS (R STANDS FOR RADIUS): 
D = 2*R . (D = DIAMETER) 

C F 2*3.lUl59265*R (C = CIRCUMFERENCE) 
A ^ 2,lUl59265*Rt2 (A = AEEA) • ' ^ <. 

Correct ^lution: 

2.1 SET D = 2-x-R 
- 2.2 s\t C = 2-x-3.lUl59265^R 
2.3 SEb^ A = 3.1^i59265^Bt2 
2.U Tyfe R, D/ C, A 
DO PARt\2 for R = 10(10)50 

In Lesson llAthe student is taught how to write and execute sequences 
of stored commands.' This is the first problem that requires the student 
to write a program consisting of more than a single command. 



L 12-.U- -<mrTE AEROGRAM THAT WILL ASK YOU FOR 3 NUMBERS, A, B, AND C, 
AND THEN GIVE YOu\tHE AVERAGE OF THE 3 NUMBERS. AFTEl\ YOU HAVE TESTED 
YOUR 'PRCfG RAM USE TG: TO FIND. THE MHCRAGE OF 
A 179-053 
B - 23.7 

c - 271.0015 



Hint: ; 
.-TO FIND THE AVERAGE OF 3 NUMBERS, ADD THE 3 NUMI^Rf T(X;ETHEK AND 

DIVIDE THE SUM 1^3^ ^ ■ 

Correct Golutiori: ^ - 

2.1.DS1MANDA 
2.2 DEMAND B ' 
■ 2.3 DEKANri C 
2.U TYPE (A + B + C)/3 
DO PART 2 

The DEMAND comiiiand has just been introduced and thi r> in the firnt 
problem that giVeG the student ]:he opportunity to ur>e it. . 
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L 13^-29 WRITE A PROGRAM 'THAT WILL CONVERT YEARS TO MONTHS. THE 
PROGRAM SHOULD ASK FOR THE NUMBER OF YEARS AND THEN PRINT THE NUMBER 
OF MONTHS. - . , . "* 

US^! YOUR PROGRAM TO FIND THE NUMBER OF MONTHS IN 
2 YEARS 

16 YEARS . . . ' 

100 YEARS . . - ■ . 

2593. YEARS . , 



/Correct solution: 

8.1 DEMAND Y 

8.2 'TYPE Y*12 
DO PART 8, 1+ TIMES 



Lesson 13 is one of seven tests contained ifl the course. Because this 
probleni is a test item, no hints are provided. Although the problem 
is similar tck Problem L 12-U, notice that It was anticipate^'-that 



students would use a TIMES modifdi 
after Problem L 



er, which was introduced in Lesson 12 



/ 



L 15-15: WFCETE A PROGRAM THAT WILL FIND THE SMALLER OF TWO NUMBERS 
X AND Y. 



Hint 



REWRITE THE PROGRAM IN PROBLEM? ik SO TflAT IT^'TYPES THE SMALLER NUMBER 



RATHER THAN THE IJ\RGER. 



TT^oJ: ^}^ o 1, ^ t i o n : 
2.1 'DEMAND X 
2-2- DEMAND Y 
2.3 TYPE X IF X < Y 
2.h TYPE Y IF X ■> - 
DO PART 2 



The problem immediately preceding this one serves as an example of a 
program that types the larger of two numbers. The example is identical 
to the correct solution given above except that the symbols > and < are 
interchanged. The conditional clause was introduced in. Ijesson 15 and, 
except fot a problem that requires the student to copy a program given 
in ito entirety in the text, this is the first problem that uses con- 
ditional commands. In grading trie oolutionc to this problem, the 
student's program in not required to provide for the case X r.: Y. 
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L 15-1?: WKETE A PROGRAM THAT FINDS THE' SMALLEST OF h NUMBERS. 

CHANGE PART 51 -(TT^OM PROBLEM I6) SQ THAT IT FINDS T!ffi SMAL;jSST OF h 

NUMBERS INSTEAD OF 3. 

Correct solution: 

7^ 1 DEMAND A ' _ 

7.2 DEMAND B . • ' ' , 

7.3 DEMAND C 

J.h DEMAND D . . ^ ^ ' 

7.5 SET S = A ' 1 . ; 

7.6 SET S = B. IT B < S , " 

7.7 SET S = C IF C < S 

7.8 SET S = D IF D < S ■ . . ■ , 

7.9 TYPE "THE SMALLEST NUMBER IS" 
7.95 TYPE S • 

DO PART 7 ' ' ' 

This problem requires modification of a similar program (usq^ t^ find 
the- smallest of -^hree numbers) given as an example in the exefcise that 
preqedes this one in the curriculum. 



L 15-18: WRITE A, PROGRAM THAT TYPES THE LARGER OF 2 NUMBERS AND THEN 
THE SMALLER. - 1 ' "' 



Hint • ^ ■ 

REVmiTE THE ABOVE PROGRAM SO THAT IT TYPES THE LARGER NUMBER FIRST, 
INSTEAD OF THfe SMALLER. . ^ • ■ 

Correct Solution: • ^ 

1.1 DEMAND A . , ' • . ' 

1.2- DEMAND B ' ■■ 

•1.3 TYPE A, B IF A >"B 

l»k TYPE B, A IF B >'A , • . 

' • 1.5 TYPE A IF A r; B . " ' ' 

DO -PART 1 , ,' , ■ ' _ , ' ■ 

Again, only 'a slight modification of a isamRle pro^jram is required'. The 
text for this problem includes an example , of three cteps Dimilarjto. 
Steps 1.3, 'l.h, and 1.5 above, with <in place of >. vSolutions jirere , 
considered correct even though they faiiec] to provide for the case A = B. 



L 15-21: WRITE A PROGRAM THAT WILL PRINT "SAME" IF ALL THREE NUMBERS 
X, Y," AND Zl HAVE THE SAME SIGN. THE PROGRAM SHOULD PRINT "DIFFERENT" 
IF THE TOMBERS DO NOT ALL HAVE THE SAME SIJ?N. . . 

, •• / 



\Hint:- ' / 

CAN YOU USE A COMMAND LIKE THIS?.. , ' - 

SET A - 1 IF X > 0 AND Y > 6 AND Z > 0 

Correct solution: 
9.1 DEMAND X" 
9.2. DEMAND Y 
9:3 DEMAND Z 
.^.k SET A = 

^.5 SET A. 1IFX>0ANDY>0 
, f'.6 SET A 1 IF X < 0 AND Y < 0 
• 9.7 SET A = 1 IF X = 0 AND Y = 0 
• 9.8 TYPE "SAME" 'IF A •= ^ 
, 9.9 TYPE "DIFFERENT" IF A = 0 
DO. PART 9 

Preceding thiG problem, is- an example of a program that determines 
whether or not two niombero have the same sign. That problem is the 
first in the course in which conJiAnctions are used. 



AND Z > 0 
AND^ < 0 
AND Z = 0 

I, 



-t '■ ! ' ■ ' — — 

L 16-k : WRITE A PROGRAM THAT WILL DEMAND A RADIUS R AND THEN CALcSlATE 
THE AREA OF A CTRCM WITH THAT RADIUS. °USE TWO PARTS, ONE FOR THE MAIN. 
PROGRAM AND ONE FOR AN KRRGR 'ROUTINI'; TO^BE UHl^D IF R in NEGATIVE. 



Hint #1 :/ . ' 

.THE AREA OF A CIRCLE IS 3.1^159'-'65 RtL>. ' 
Hint #?: ■ . 

FIRST, GET T1W. RADIUL; M USING A DEMAND COMMAND. 
SECOND, 'DECIDE WHETHER OR NOT TO GO TO THE ERROR ROUTINE. 
THIRD, TYPE THE AREA. 

Correct solution: 

.3'. 1 DEMAND R _ - . 

3.2- TO PART 1+ IF R < 0 

TYPE 3.IU159265 x Rt2 0 V, '^i 

U.l TYPE\"A RADIUS CANNOT BE NEGATIVE"' ' 
DO PART *\ . : 1^ 

This problem uses the branching command TO, which has Just been intro- 
duced. An almost id&ntical error routine is r.hown in a preceding -problem 
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L 16-6: ' WKETE A PROGRAM THAT . WILL TYPfi 3 NUMBEl^S A/.B, AND C IN ' 
NTJMERIC ORDER. USE SEVERAL PARTS TO MAKE IT EASIER TO CHECK EACH 
PART. • . •' . ■:. 



Hint #1:- . ' " 

PART 1 SHOULD FIND THE SMALLEST OF Aj B, AND ^, AND THEN 

...BRANCH TO PART 2 IF A IS SMALI^IST. ' ,;■ 

BRANCH TO PART 3 IE B IS SMALLEST. ' 
.■. .BRANCH TO PART 4 IF C IS SMALLEST. ,; ; " . - 

Hint #2: 

PART 2 SHOULD BE USED t#, A IS- THE SMALLEST OF A, B, AND C. 
PART 2 SHOULD- DECIDE WJIECH OF' -B ANP C tS THE- SMALLEST, ETC. 



Correct solution: 



1.1 
1.2 

A»3, 

1.5 

1.6 

1-7 
2.1 
2.2 

3-1 
3.2 



DEMAND A 
DEMAND B 
DEMAND C 
TO PART 2 
TO PART 3 
TYPE C, A 



A 
B 



TYPE 
TYPE 
TYPE 
TYPE 
TYPE 



IF 
IF 
B IF. 

c, B^-:;A ;f 

A »; B, C-IF 



A, 

B, A, 
B, C, 



B IF 
IF 
A IF 



DO PART 1 



< = B AND 
<= A AND 
<^ B 

> Pt 

< ^ 

> B 

< = 

> A 

0 



< = 



C 



This problem 'la. one of the longest and perhaps the moot. difficult in 
the entire course. The otudent has uced TO In' only one other problen 
(L 16-h above), and no .gimilar program Id chown in the le'GGOri. In 
grading the nolutions to thia problepi the, programn wore expected to 
function only for unequal values of' A, fi, and C. 

■ / 



L 23-7 : WRITE A PROGRAM THAT WILL PRINT THP: MULTIPLICATION TABLE 
UP TO 5 'jlMES 5 . 



Hint: 



THE PROGRAM f.HOULD, PRINT OOMETHING LIKE THIf, : 



MULTIPLICATION TABLE 
1 
2 
3 

■r 



•2 


3 


1+ 


5. 


h 


6 , 




10 


6 


9 


12 


'15 


8 


12 


16 


'20' 


10 


• 15 


20 


25 



20 




Correct solution; ' . ' * 

i U.l SET X = 1 ' ■ • 

, k.2 TYPE X, X^2, X^3, X^^^ X^5 IN FORM 7 • J 
U.3 SET X = X + 1 . 
^ • k.k TO STEP k.2 IF*X < = 5 ' ' ■ . . 

FOIM 7: 

.DO TART if ' . 

.This i$ the first program t^ use a loop (iritroduced^n Lesson 23), 
it was anticipated as a difficult problem^ because/no similar progi;^ 
was shown previously. The student is allowed to print insults 
fom rather than in the tabular form shown above. ' 



2U-lH"^7RITE A PROGRAJ^THAT WILL DEMAND A VALUE PGR. N, AND WILL THEN 
START 1 AND CGUNT..'UP.^TO N, ; , ■ . ■ 

•EOT EXAMPLE, IF YOU GAVE 7 AS THE VALUE FOR N, THE PROGRAM SHOULD TYPE 

■ 1 ■ * ' " 

2 . ■ . , . 

k 

■ 6 ' 



Hint: 

THIS PROGRAM IS THE SIMPLEST POSSIBLE KIND OF LOOPING PROGRAM. 
IF YOU CANNOT FIGURE OUT HOW TO 'DO IT, YOU HAD SETTER GO BACK TO 
THE BEGINNING OF LESSON 23 AND RE AD THE • EXAMPLES VERY CAREFULLY. 

Correct solution: f ■ ' ' 

6.1 DEMAND N , - 

■ 6.2 SET C = 1 

6.3 TYPE C . ^ 

6.U SET C = C + 1 

6.5 TO STEP 6.3 IF-C < = N - 

DO PART 6 • * 

The use of loops iterated a variable number of times is discussed in 
Lesson 2k, the second' lesson on loops, and this simple problem (with 
the .unhelpful hint) is the first prograinming, proU.em in the lesson. 



L 25-8.: HERE IS A PROGRAM WITH A LOOP: ■ 

3.1«iSET 'C =1 . . ■ . ' 

3.2 TYPE 1/C ^ ^' 

3.3 SET C = C'± 1 . \ * [ 

S.h TO STEP XF;C <Q ^ . 

REWRITE THE PROGI^'S.O THAt YOU .^AN USE A "FOR^'' CLAUSE. - 



dpri^ct solution: ^ . 

-/ i.l TYPE 1/C " • . ; • . 

^ DO STEP 1.1 FOR C ^ l(l)7 

Thd equivalence Df tvo methods of .iterated execution is discussed in 
Lesson 25>, and this is the first programming problem requiring such 
a transformation. , • 



L 26^.5: WRITE A PROGRAM TO CONVERT INCHES TO FEET AND INCHES.- . USE. A 
"DEMAND" COMMAND IN A - LOOP. -'^ - ' 



Correct solution: " . ^ 

h.l DEMAND I . . ^ ' ^ / : . . 

i|.2 SET F = IP(I/12) . ^ ■ . ^ . . 

. SET 1=1- 12^F 

U.i^ TYPE F, I IN FORM i| ■. 
^ TO STEP h.l . ' / 

^ FORM k: ■ ■ — / •■* 

*-^FEET,. ^INCHES . * ■ . 

DO PART ^ . ' ,. ^ . ■ 

Since the execution of/fe progr^ .^jontaining a DEMAND comittand carl be 
halted by answering 'tfi^ DEMAND .with" a carriage retUrn, bcemingly endless 
loops that iacoiporate pEM^^ND'^s are acceptable in AID programming. They 
are explained in Lesson 26. / \ 



L- 29-19: WRITE, A PROGRAM TO FIND WHICH -OF^ THREE NUMBERS A, B, AND C 
I§ CLOSEST TO 13/17-, ' • . ^ ' 



Hint : ^ i, * 

USE THE ABSOLUTE VALUE TO FIND THE DISTANCE. FIRST FIND WHETHER A OR 

B IS CLOSER TO 13/17. THEN FIND IF THAT ONE OR C I? CLOSER. 
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Correct solution: 



, 1.1 
1.2 

1-3 
l.k 



DEMAND A 
DEMAND B 
DEMAND C 
TO PART 2 



lA 
lA 



13/171 
13/17 J 



IF I B - 13/17 1 < = 
1. 5 TO PART 3 IF IC .13/17.' < = 
1.6 TYPE" IS CLOSEST TO 13/17" 
5.1 TO PART 3 IF !C - I3/I7I < = JB - 13/17-' 
.2.2 TYPE IS CLOSEST TO 13/17" ' > ^ 
3.1 TYPE "C IS CLOSEST TO 13/1^" 
^DO PART 1 ^ ■ 



iesson 29 introduces the AID notation for absolute value J Ixl , and 
discus&es the use cf absolute value for finding distfence^ between 
points on the. number line." No program similar to the abDve is used 
in examples. This problem is probably one of the most difficult in 
the course^ primarily because the obvious approach (usifng conjunctions) 
produces Gommands that are too long to be correct AID /commands. Solu- 
tions need not provide for the'possiMlity that two pMnts are at the 
^same distance from the \f ixed point (in fact^ the *co/rectV solution 
shown above does not do this)... 



L 32-5 SET L EQUAL TO TEES LIST OF NUMBERS : 

1, 7 li+^ 2^ 5^ 21 
WRITE A PROGRAM THAT WILL TYPE ALL OF 'THE NIJMiE 
THEN CHANGE THE LIST TO THE FOLLOWING AND RUN 

5/50, 100, 0, 1, 2 



RS AND GIVE THEIR SUM. 
i?HE PROGRAM AGAIN. 



Hint #1: * . 

USE ONE PROGRAM WITH A lEMAND COMMAND TO SET^ L E9.UAL TO THE LIST. 
USE ANOTHER PROGRAM TO TYPE THE NUMBERS AND /GIVE THEIR SUM. GET 
ANOTHER ffi}NT IF YOU* NB?fD MORE HELPt 




Hint #2 : 

^AT THE END OF THE £^T PROBLEM THERE IS A 
'"05: A LIST. \THERE ARE OTHER WAYS. TO DO IT 
'YO&'-lilKE. USfi A MAIN PROGRAM AND A SUBRO; 
PROBLEM. THERE IS ANOTHER HINT IF YOU 

? . 

•Hint #3: - ' ' ' 

IN THE MAIN PROGRAM --r 'THE COMMAND TO 
NUMBER IN THE LIST; THE . COMMAND TO PRINT 

< IN THE SUBROUTINE — THE COMMAND TO P^^NT A NUMBER IN THE LIST; THE 
COMMAND TO ADD THAT NUMBER TO THE SUM. 



f^ROGRAM TO SET THE VALUES 
AND YOU MAY TRY ANYTHING 
PINE FOR THE REST OF THE 
!]D IT. 



REPEAT THE SUBROUTINE FOR EACH 
THE SUM. 
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Correct solution: 

2.1 DEMMD- X 

2.2 SET L(I) = X . - ■ • 
DO PART 2 FOR I = l(l)5. - . 

3.1 SET S.= 0 
' ' 3.2 DO PART k FOR I = l(l)6 

3.3 TYPE S • • \ 
k.l TYPE L(l') 

h.2 SET S = S.+ L(I) 

DOoPART 3 , , 

This is the first programming problem in Lesson 32, which introduces 
lists and indexed variables. This problem is quite difficult, since 
the applicable strategies other than for input are not discussed jDsfore 
this problem is given. ' . ' 



L 32-8: WRITE A PROGRAM TO FIND THE AVERAGE OF 'THE NUMBERS IN A LIST 
OF TEN NUMBERS. TEST YOUR PROGRAM ON THESE ^ TWO LISTS: 

A. -10, 0, 1, 5, -3, 28, IT, 6, 11, -7 ' . 

B. -.U, 2.5, 3.1, ^5.8, 0, T.l, k, &.9,'^2, 3.1 

\ ■ ^ ' 

Hinit^: 

AVEIIAGe"= SUM OF VALUES 11^ LIST/ NUMBER OF VALUES IN LIST. ■ 

Corl'e'ct solution:. 

3.1 DEMAND X . 

3.2 SET L(I) = X - 

DO PART 3 FOR I = l( l) 10 ^ , 

k.l SET S = 0 ' .. " . 

U.2 DO STEP 5.1 FOR I = l(l)lO ' 
k.3 TYPE S/10 . 
5.1 SET S =i S + L(I) 
' DO PART k ^ ' / * 

This problem shovl^MJot^be difficult for most students, since the pre- 
ceding exercise shows an example of a program that computes the average 
of the nvimbers in a list of five nvimbers. 




L 32-19: ^ WRITE A PROGRAM TO FIND AND PRINT ALL THE NUMBERS LESS THAN 
30 IN A LIST OF 10 NUMBERS. TEST YOUR PROGRAM ON THIS LIST: 
10, ho^ 39, 19, 28, 31, 30, 29.999, 16, 37 



' 2k 

30 . I 



Correct solution: ^ ' 

5 . 1 DEMAND X ^ 

5.2 SET. L(I) = X ' 
DO PART 5 FOR I = l(l)lO 
1.1 DO PART 2 FOR I = l(-l)10 

2.1 TYPfi L(I) IF L(I) < 30 ' • 
• DO PART 1 • . . 

This program is simpler than the precedii^g one (L 32-8), but no model 
is given in the lesson. Solutions , tfcat use < in place of < were con- 
sidered incorrect. : • 



This concludes the list of programming problems used' in this study. 
None' of the anticipated solutions are lengthy programs; the longest ^^^^ 
tains 12 commands and most require two to five commands. Although many 
of these problems appear simple, the students did not .find .them sp. For 
this reason, solutions were not graded strictly; in some cases programs 
were considered correct even though they contained discontinuities not, 
present in tt;e expected solutions (for example, failure to account for 
'the case^X^Y in Problem L 15-15)'. 

I turn now to a discussion of the construction of expected correct 
solutions. The criteria used for thib task were not completely objective 
The rules followed are listed in the order in which they were applied. 

1. Only lexical elements and grammatical constructions that had 
been previously taught were 'allowerdT . . 

2. If a correct program or part of a program was shown in the 
problem statement, or in one of the hints, it-^was used. 

\ 3. IS an applicable strategy was mentioned in the problem statement, 
or in one of the hints, that strategy was Used. 
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k. If k s.imilar prograin or pa^rt of a^^giaiiiar program vas shown, in 
the probl^ Statement, or jli one of the preceding four exercises, 'it 
was adopte^^, prodded rulei 2 and 3 were not violated. 

5'. If' an applicable dfew AID construction was introduced in the 
current lesson. It was used provided none of the above rules were 
viplated. / ^ 

6. Of the correc/ solutions thslt satisfied the above requirements, 
the shortest chdsen. The shortest solution is defined as that which 
requires the fewest character for the program^and the commands necessary 
to execute it four times 1 

The expected correct solutions obtained by ..these rules are neither 
the shortest/nor +hfi most efficient solutions, and. in a number of 'cases 
students pBbduced more elegant programs. How well students' solutions 
were pred/cted is shown in Table 1. For 15 problems th.e expected solu- 
tion was /the most common correct solution; however, this was not always 
the casi, and for five problems, no student gave the anticipated p^luLion. 
A student's solution was considered equivalent to the expected solution 
if'' it differed by no more than optional spaces, names for variables, and 
step nimbers. (This conception of program equivalence is the first one 
discussed in Chapter* VTT and a more precise definition is gi^jren there.). 



26 



Table 1 

Comparison of Expected with Student-written 
Correct Solutions 



Problems 


for which the expected solution 


was most frequent 


occurred but was 
not most frequent 


did not occur 


L 5-30 


L 8-9 


L 11-11 


L 8-27 


L 8-28 


L 15-21 


. - L 9-3 


L 9-8 


L 29-19 


L 10-12 


L 15-18 


L 32-8 


L 10- ly 


L 26-5 


L 32-19 


L 12-1+ 


• 

*• 




L 13-29 






" L 15-15 






L 15-17 






L 16-1+ 






• L l6-6 * 






L 23-7 X* 






L 2l|-ll 






L 25.-8 

" -< ■- 1 _ 

L 32-5 
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The expected solution was given by only one student but rijzr 
other correct solution occurred more frequently. 

The expected solution was given by two' students . 
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CHAPTER III 
Description of the Data 



The d/ata consists of the (recorded) work perfomed by J+0 -atudents 
on 25 programming problems. Students worked a total of 7^+7 problems. 
The distribution of number of problems attempted is shown in Figure 1. 
The number ranges from five to 25, with a mean of 19- Because the . 

6 

,1^1-72 AID course was student-controlled, students were permitted to 
work problems in any order or to skip problems; thus, some students who 
completed all 32 lessons did not attempt every problem. In addition, 
several students did not Complete 32 lessons, resulting in a steady 
decline from the first problem to the last in the number of .students 

, ^^^^ 

attempting a problem. This distribution— the number of students 
attempting each problem-^is shown in Figure 2. The number of attempted 
solutions for a problem ranged from ik to 38, with a mean of 30. /Because 
of the high variance in the number of problems attempted by each student 
and in the number of otudenta who attempted eacn problem, " most of the 
statistics cited hereafter, are given as proportions • 

The fact- that a student attempted a problem tellb us little about 
• how much work he did or how close he came to solving the problem. The 
correctness of students' solutions is discussed in the .next chapter. A 
good indicator of the amount of effort expended is the number of AID ^ 
commands typed while attempting to solve a problem. A total of 7O63 
commands-Vere typed by all students. The number of commands typed fo?* 
a problem ranged from 1 tp 72, with an average of 7-1- The average 
number of commands typed for each problem is shown in Table 2. 



Table 2 

Comparisons of Number of Commands Typed and Executed 
for Observed and Anticipated Solutions 





Number 


of CcMimands 




Problem 
number 


T^rpod In student 
solutions 
( averaiie) 


In anticipated 
2orrect solutions 


Executed In 
student solutions 
(averacte) 


L5-30 


3.7 


2 


3.5 


L8-9 


'3.2 


• l* 


2.9 


L8-27 


k.5 


2 




L8-28 




. . - 2" 


k.e 


1.9-3 


7.1 


2 


5. J 


L9-8 




2 


k.k. 


LlO-12 


1 2.9 


2 


2.3 


LlO-19 




2 ' 

J . 5 . • 


2.8 


Lll-11 


•7.7^' V 


5.9 


tJ2-k ' 


7.8 


5 


6.8 


LI3-29 


6,6 


3' , 


k.k 


LI5-I5 ■ 


11.2 . . 


5 


7.1 


LI5-I7 , 

f 


13.3 


11 


10.3 


tl5-l8 . 


9.1 


6 


7.1 


LI5-21 


15.0 , 


'10 


^' 11.8 


L16-1* 


9.5 


5 


7.it ' 


LI6-6 


22.6 




lU.3 


L23-7 ' 


ll*.2 


6 


11.7 


TPli-11 


10 16 


6 


8.2 


L25-8 • 


5.-0 


2 


3.6 - 


L26-5 


10.1 


7 


8.0 


L29-19 " 


ll*.2 


10 ' 


11+.7 


L32-5 


23.3 


9 


L32-8 


23.5 . ■ • 




. <*15.7 


L32-a,9 


' '"l6.i* 


6 


12.0 




Mean: 10.22 
Si'D. : 6.26 


Mean: 5-36 
S..D. ; 3-17 


Mean: l.kk 
S.D. : 3,99 




, , r = .831+ 


r = 


.821 ~ » 
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N . ;v ^ ■ i : ■■ ' ' ' ■ 

As on6 wdfild expect, the average number of • commands typed per . 
• problem varies ^considerably from one problem to the rieict. As shown in- 
_...^Table 2, tite^ average for Problem L8-.9 is:,3»^ lin.es, while the average 
for L32-.8 is ^3.5 lines A^omparison between: the: average number of 
commands typed and the number of 6ommarids:use"d ip, the. anticipated correct 
solutions is shown in Table 2. The number of commands typed is consis- 
tently greater than, intact almost dpuble, 'the number of commands in the 
anticipated solutions- One must conclu;ie that the students' attempts to 
solve these problems wei*^ not cursory efforts. The correlation between 
. ' expected and observed values is quite good; r = .83^. 

Perhaps a -more useful measure of the amount of effort expended than 
the number of commands typed is the number of commands executed. Of the 
7063 typed commands, 5177 were executed. Thus, I886 of the typed com- 
mands--26 percent of the total--were unused, either beca^(|fe no attempt 

was made tc excuute them ot because they contained errors that -prevented 

t., -v.. 

their execution. We see in Table 2 that the number of commands in the 
anticipated correct solutions is less than the number executed, although 
the difference is not grpnt as that for tj^ped commands. There are, 
however, three problems for which the average number of executed commands 

is less than the number used in the anticipated solution. One woulri 

\ 

expect a higher correlation for executed commands thanxfor typed commands, 
but it is slightly lower, .821 compared with .83U. 

We can characterize the commands that constitute the data by looking 
at their function as programming commands/ which may be done in two ways. 
First, we can classify commands according to whether they are direct 
(immediately executed) or indirect (stored) commands. Second, we can 

ERiC 



classify commands according to the AID verb used, ,The number of^occur- 
rences of the different kinds of commands/ u^g both methods^ -o-i^^-claQS- - . 
ification^ ig shown in- Table 3: A command does not appear in the .data 
before it is introducec^.,>ii'-th^' curriculum^ that is, no indirect commands 
are used ;before Lesson 10, no DEMAND commands are used before Lesson 12, 
etc. With the exception of the LET ^command, which is used heavily in 
earlier problems and less in later problems, the number.pf occurrences 
pf a given kind of command remains fairly constant after the command's 
introduction fa^though we see some large fluctuations from one problem 
to the next). Thus, though. the total number of commands tends to in- 
crease with lesson number, the increase is due to a- greater niimber of 
different commands used, rather than increased frequency of uce. Tfcis 
is partly due to the nature of the curricultim, in which an effort was 
made to arrange problems and lessons so that a command once introduced 
was used frequently thereafter. That this did not occur with LET 
indicated a weakneos in the cur;i:*iculum, which was' subsequently corrected 
We turn now to a comparison of the proportions of types of commands 

' observed in the data and the proportions used in the anticipate^ correct 
solutions an nhown in Table h. Looking firut at the clacsification by 

'verb (second part of Table h) we see that the ^'anticipated aolutionn do 
not contain any occurrences of DELETE or of -the file commands , an^J 
therefore they cannot serve as predictors of the number of occurrences 
of these kinds of commands, ^ven so, the match between predicted and 

. observed Ib extremely .good. The correlation coefficient is .958. - The 
only two marked discrepancies are fo?: SET and DEMAND; there is a higher 
proportion of SET commands in the , data than in the expected solijtions 
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j Table k \ j 
Comparison of Observed 'and Exp^.ct^d .Proportions 
"of Different Kinds ofrCommands 



Kind of Command 


Number of Occurrences 


Proportion 


Observed^ Expected"^-^ 


Observed Expected 


Direct 
Indirect • 


8b. 1 38 
96.5 '96 


k^M ■ 2Q.ki . 


Total 


176.6- /ISlt 




TYPE 
SET 

DEMAND 

TO 

DO 

LET 

DELETE 

EORVL 

Eile 'commands*^-^ 
Unidentifiable 


" 53.6 1+1- 
3T.lt, * . 25: . 
21M 27 
„ 8.8 *9- 
■ 33.8 „ " . 25 
7.5 - 5 
k.6' 

2.5 • ' 2 
•1:5 ■ _ . . 
5-5 


. 21.2^ " .18.7^ 
12. 1% 20. 1?) 
5-0^ 6.7^ 
I9.l'5{i 18". 7^. 

■lt,2^ ^ 3.7^. 
2.6^ 

. i.lt^ 1.5^ 

'p.8?& 


Total 


17.6.6 131+ 




M^an 
S-D. 


17.7 13. h . 
18.1 'lit. 8 - 
■ r = .958- 


1 ^ 



•^The observed number of occurrences of the different kinds of 
commands is the total number of such occurrences divided by ko 
"(the ni3inber of .students who contributed to the data). 



•x-K-The expeoted number of occurrences of the different kinds of'* 

commands are taken from the expected correct solutions listed. 
- in Chapter II. 

•5^-5(^USE, EILE, RECALL, and DISCARD ^ , ; ' ■ 

• ' ' ■ 35 i . ■ ^ 
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One reason for this 

\\ 



and a loVer proportion of EEMAND commands, 

stt(dents sometimes used direct SET commaA^is as a means of input vl^^ 

the anticipated solutions used EEMMD. Another reason r& that stud^ 

urijjdoubtedly made fever errars in lEMAND c'l 

AID commands both in syntax and in semantics, an=d therefore retyped 'tlifem 

less frequently. A study either of the pi^OppTtions of different kirids 

cj)f commands used in correct solutions or of the distribution of errors I 

over command types would examine this hypothefsls. Neither of these 

s'tudies has yet been undertaken/ . ' 

■ ' '\- ■ • . • ■ . 

Looking at the comparison of the proportions of direct and indirect 

/ 

commands ( also shown in Table k) , we ^e that the anticipated corrfect 

solutions do not function well as predictors o£^the observed profio^rtions ; 

' ■ ' t ' ' ■ \ 

students used a much higher proportion of direct commands than did the 

anticipated solutions-. .This can be explained partly by the fact that 
direct commands play a larger role in debugging than iii^writing prog^rams, 
and the anticipated solutions cannot be expected to serve as predictdors 
of commarids u^ed l^r debuggitxg purpose.s.. ifriother rea^^oii for the higH 
proportion of direct commands in students* work is that .^itudents fre- 
quently omiT the ^tf'p number in what was intended a^: an indirect coipmand.. 
Since the criterion for classifying a command as eitner direct or in- 
direct is the absence ';.r presence of a step number,- these erroneous 
commarids were incorrectly classified.-^ » 
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• CHAPTER IV 
Distribution of Correct Solutions 

" Perhaps the single most important question" to be answered by this^ 
study is: How many of the students solved^ the problems? It is -to tMs ^ 
question that this chapter is addressed. " * ^ 

In order to answer the question, one must flrstvjestablish* criteria 
for cc5rrectness, npt a trivial task for ^programming problems. ¥e could- 
beg the question by referring tile reader to Appendix A (which contains ; 
a list, of all the correct solutions found^ in the data under consideration)^ 
Any solution in that list is correct. Thus, membership in the list' is a^ 
sufficient condition for correctness, but not a neces'sary on«. 'Rather 
^than give a comp5.ete and exhaustive' list of the criteria used in grading 
students' work, we will give an infprmal description of the attributes 
we looked for. " > \ 

First,, each correct program must perform a 'minimal' function. For 
mos,t problems tKiS' function is defined by the correct solution listed in 
Chkpter II, but for a few problems, the minimal function is. a subset of 
'that anticipated function! For example, for Problem L15-15,. which asked 
for a program that would find the smaller of two numbers, the function 
defined by the anticipated- correct solution is ^ 

X if X < y ^ - : ' 
. ; yx,y) . \ 

. ^ .... 

y if - X > y . . 

In grading students' work^for^this problem, the minimal function used 

was ^ , 
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X if X < y 
.. . ^ X if X > y . 

y 

The domain of excludes pairs (x, x), and hence, fj^ is a proper subset 
of f The comments given on grading in Chapter II, along with the cor- 
rect solutions listed there, are sufficient to imply the minimal function 
that was used for each problem, so a complete list of minimal functions 
is not given here. 

In one case (Problem Ii5-30) students were asked merely to compute 
three numerical results, and any method (other than computation by hand) 
that produced the three correct values was* considered correct, so that 
the minimal function for this problem contains only these pairs. Fo^r^ 
all other problems^ a solution was not considered correct unless it was 
a general solution; that is, the domain of the minimal function is quite 
large. In a few" instances, students' programs defined functions that 
included the Anticipated function as well as the minimal function. In 
other words, the students' solutions were better than the minimal one; 
'for an example of this, see Solution to Problem 23-.? (Appendix A). 

The cc6iputational algorithm used by the stuaent could be defined 
either as a stored program or as a user-defined f\ariction, and in general, 
the student used the same device as in the' anticipated solutions. In 
-either case, the students' solutions were required to .print values as 
well as to compute them. . » 

We have been discussing functions defined b;y programs as if they 
were real-valued functions. In fact, 'these functions prdin&rily have 
,as' values text strings in which numeric values may or may not be imbedded. 
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Consider, for example, Problem L15-21, which asks for, a program .that will 
print efther 'SAME' or 'DIFFERENT* depending upon a comparison of the 
signs of three numbers. Or, as another exanfple, Problem Ll6-U requires 
a subroutine to print an error message if a negative value is given as 
the radius of the circle. Problem L29-19 also requires text output. 
For these three problems a student's program was considered correct if 
it typed text with the appropriate Content. Thus, for. Problem LI6-U, 
these error messages would all be conside^d equivalent and correct: 

A RADIUS CANNOT BE BE NEGATIVE. 

DON'T USE NEGATIVE NUMBERS. ' . 

YOU'RE NUTSI 

( 

Such decisions about equivalence of text are, of ccurse, easy in han'd 
grading but present great difficulties to an automated ^procedure for 
grading programs. Other than the three problems Just mentioned, the 
minimal function used in grading did not include text. However, students^ 
programs frequently provided for more than the minimal output. Often 
this was done by printing input values as well as output values. For 
example, one program to convert inches to feet and inches printed the 
recult in the form • 

27 INCHES EQUALS 2 FEET AND 3 INCHES 
rather thari the simpler 

2 FEET AND 3 INCHES 
used by most programs. In grading, the context of the output values was 
ignored if it was not required by the minimal function. 

In addition to computing and printing correct values, students' 
solutions were also required to handle input reasonably. Input in AID 
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progranuning can be managed in two ways. Using DEMAND commands, a program 
can request the input data it needs, at execution time. A' second method 
is to store data in the program before exeoAtion, by means of dire,ct SET 
commands, a FOR modifier, or an auxiliary program that uses SET or DEMAND 
commands. Using either method, a student »s , solution was not Judged to be 
correct unless it was executed correctly; for those problems that speci- 
fied input values, the student's solution was considered/correct only if 
he executed his pr,ograra successfully for each specified value. 

For certain problems, additional criteria of correctness '^ere im- 
posed. A problem statement might contain explicit instiructions to use 
a specified kind of command or programming structure; for example. 
Problem L8-9 requires a LET command. as does Problem L8-27; Problem L26-5 
requires a l6op incorporating a DEMAND conpand. Requirements implied 
but not explicitly stated in a problepi statement are not tal^en as abso- 
lute, however, Thus, Problem L23-7, which asks for a program to prLnt 
part of the mult;Lplication table, did not require the output, in tabular 

o 

form, though the use of the word 'table' implied that it should. 

For a few cases the t^tandards descilbed above prcs/'ed inadequate in 
oome way, primarily fox the lat3t three problems, which required the use 
of indexed variables. Solutions to such problems muet be studied in 
much more depth and with more data before strategies for an automated 
program check can be devised. 

In checking for correct solutions, all trials for the first en- 
coiinter with a problem were inspected until a correct solution was 
found. Table 5 summarizes the performance on each problem, showing the 
number of correct solutions for the first trial, and the number^ correct 
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Table 5 

Comparison of Performance on First Trial and All Trials 



Problem 
number • 


Number of' Correct Solutions 
First trial All trials 


Proportion Correct* 
First trial 


L5-30 


17 


35 


.1+7 


L8-9 


' 31 


'36 , . 


.81* 


L8-27 y 




29 


.66 


L8-28 


21 


28 


^.•62 


L9-3 


19 


29 


•59 


L9-8 


12 


Q 19 


■ /<3^- 


LlO-12 . 


27 


33 


•75 


Llo-19 


30 


31^ 


.86 


Lll-11 


22 


26 




L12-U 


28 




.80 


L13r29 


. 21 


■ '21 


.60' • • 


LI5-I5 . 


. 15 


•19 


'M ' 


LI5-I7 


18 


19'"' 


.6U . 

• 


L15-2:8 


11 


11 


.1+1 


LI5-2I 


20 


22 


•69. 


LI6-J4 


2k 


25 


.77 


LI6-6 


7 


9 


.21+ 


L23-7 


12 


16 


.50 , 




8 


8 




L25-8 


18 


18 


.72 


126-5 


19 


17 


.5V 


L29-19 


3 


3 


^ . 12 


L32-5 


,7 


7 


.27 


L32-8 


10 


10 


.1+3 


L32-19 


6 


7 ■ 


M 




1^27 


515 





^VoeQ number of students attempting problem as denominator. 
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over all trials. Of the 71^7 attempted' solutix^ns, 1^27 (57^6) were correct ^ 
on the first trial and 515,,(69?t) were correct on some trial. The dif- 
ferences betveen performance on first and subsequent trials is not great, 
except for three problems (L5-30, L9-3, and L9-8) in which the second 
figure is 50^ higher than the first. The score for each student was 
computed for first trlai^T^he mean of these scores is with a range 

' fi;'om 10^ to- 92^, The distil bution of students* scores is shown in 
Figure 3* ■ Two conclusions can be drawn — one, insofar as variance is an 

vindicator, this set of programming problems was well chosen as a test, 
and two, the students did not find the sq problems, easy. Although the 
perfonnance of these same students on other exercises in the AID course 
has not been analy2;^d in detail, the average scores on all exercises in 
trie course is over 75^, considerably higher than the ^7^ for thfe, set of 

'progTOmming pfojjiems considered here. 

The proportion correct , shown in the third columja of Table 5, is 
used in Chapter VT as thp primary measure' of problem difficulty. Com- 
paring the proportions correct for different problems we note a range 
of .12 tP .86. The three most difficult problems by thi'j criterion are 
LI6-6, L29-I9, and L32-5. Both LI6-6 and L29-19 aare logically coiyiplex 
jproblems requiring the use of several conditional branches. L32-5 iB 
the first problem using indexed variables, and the fact that it is quite 
difficult probably indicates the inadequacy of the curriculum rather' than 
the inherent difficulty of the problem. A more detailed study of problem 
difficulty is pursued in Chapter VI. 

Rather than Judging solutions by a simple correct-incorrect scheme, 
we found that some system of assigning partial credit was also desirable. 



The one used here is a simple count of the number of commands used in%. 
correct or partially correct solution. * This is not a completely satis- 
factoiy system, but it does have the virtue of providing a fairly 

* 

objective measure. Using this measure of correctness, wjs were able to. 
determine what proportion of the effort expended by students was useful 
effort. Table 6 shows the average nvunber of commands used in correct 
and partially correct solutions for each problem* The criterion used 
in tallying the Commands for Table 6 vas not c3nly that /the command con-' 
tribute to a correct solution, but also that the cfommand be executed. 
Originally correct commands that were replaced by the -student before 
exec'ution did not contri-bute to these statistics, nor did commands tha$ 
were stored but not executed. 

In comparing the statistics from Table 6 with th6se 'in Table 2 
(number of commands typed), note that less than half (3U0V7063) the 
commands typed were used in correct or partially correct solutibns. 
Students typed an average of 9.5 commands for the problems attempted, 
but only If. 6 commands contributed toward a correct solution. Looking at 
the totals for different problems/ we aee that fur three problemj^ (L29-19^ 
L32-5, and L3?-19)l fewer ♦'nan one- thirl of typpd conmiandt:^ contributed 
to correct soluti-jn.;. Two of the ^e problems, ar*:^- fr^^m^ Lc.'uon 3 *, which 
introduces indexed variables, • ajid again we attribute thin to a weakness 
in the curriculum rather than to characteristics- of the problems them- 
selves. This supposition might be confirmed by comparisomi with data . 
from similar problems in the 1972-73 AID course in which the lessons on 
indexed v6riablei^ were substantially revised. 
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Table 6 

Number of Commands Used in Anticipated and 
Correct and Partially Cotrect Solutions 



jrroDxem 
number 




Average Number of Commands*^ 


correct solution 


florrect and Dartiallv 
correct solutions ■ 


L5-30 


2 ' 


2.2 


L8-9 




■"2.6 


L8-27 


•2 . • • 


2,.7 


L8-28 




3.1 


L9-3 


.2 




L9-8 


2 


/ ' H.o . 


LlO-12 


2 * ■ 


,2.0 


Llo-19' 


2 

• * 


2.1 .. • 


Lll-11 . 


5 


• ^ V.O 


L12-k 


5 . ■ 


: ' h.l 


LI3-29 




•2.7 . 


LI5-I5 


' 5 




LI5-I7 


11 


.„ " .^8.5" 


LI5-I8 


. 6 




LI5-2I 


. ip 


9.9 • 


Ll6-k 




5.8 


LI6-6 


12 


- 8.0 


L23-7 


6 


■ "6.2 


L2H-11 


6 


5.9 


L25-8 


-'2 


• 1.7 ' ■ 


L26-5 ■ 


7 


6.5 


, L29-I9 


10 


3.7 


L32-5 


9 


5.1 


L32-8 . 


8 


10.5 


L32-19 


. 6 _ 


5.0 



^^Used 3kOk typed commands. 
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A similar comparison can be made between the nximber of ^ commands 
used in correct or partially correct solutions and the number of executed 
commands, wh^ch are shown 'in Table 2. This' comparison is more meaningful , 
than the preceding one because the commands tallie'd in Table 6 had to be 
executed, and thu^ we are' comparing. the number of executed commands that 
contributed, toward a correct solution with .the total number of executed 

s 

commands. Of the commands that were executed, two-thirds (3^0V5177) 
-contributed toward a correct solution. For a detailed comparison, look 
again at' the averages for the individual problems. We find tha-d fewer 
than half of the execute^ commanda. contributed toward correct ^solutions ^ 
for three proble;nB: L25-8, L32-5, and L32-19. The last two of these are 
the same two (quite difficuD^^t) problems ^from Lesson 32 for which the dis- 
crepancy with commands typed was so marked. Interestingly, the other 
problem, L25-8, is one of the easiest problems in the set with a proba- 
bility correct of .72. Looking at discrepancies at the other end of the 
scale,. .we find that 90^ of the executed commands contributed to correct 
solutions for Problems L8-9 and L9-8. For l8-9, 8U^ of the students " 
produced correct solutions on their first trial, but for L9-8 that figure 
iG only ^510. ObvlovGly no nimple relation exiote between thejje different 
measures and, a more detailed analysis o£ tha||^lationf;hipB l£i undortakon 
in Chapter VI. . 

Jxi the correct-incorrect grading we allowed only completely correct 
Bolutiuns. However, relaxing these standards somewhat, ve can define 
anotjher variable, allowing as qorrect those progi'ams that are correct, 
up to algebraic expressions. ^ disregarding algebraic errors, we can 
obtain additional, possibly better, evidence of the programming difficulty 



represented by the different problems. The pertinet^ summarj^ statistics 
for this variable are shown in TabfLe 7. ^he nvunber of solutions that ' 
were correct, except for algfebraic^ errors, Ts shown- separately froiti the* 
totial in' order tojemph^ize the variance. The correlation between these 
two measures of 'proportion correct is quite high =: .90U)-. Notice that 
15 'Students solved Problem L^-SQ correctly, except for algebraic errors; 
this nearly equals the nvimber who" solved the problem^ completely correctly 
(17), and changes the proportion cfbrrect for that problenr from .U7-to'^89- 
Othe^ changes are less impressive, but several others are also subs;feantial 
the proportion cortpct for L9-8 changes from .35 to .56, for Ll^l2 from 
.78 to .92, and for L2U-11 from .38 to .52. For nine of the^roblems, 
no change in proportion correct is achieved by modifying/che definition 
of correctness. 

Jn some pases the definition of the minimal/Unctibn,' discussed 
.earlier, markedly affected the measures d^rj^d for proportion correct. 
The problems that would be most not^^^e^ly affected if -the criteria were 
more stringent are L15-15, L15-18, L16-6, and L2g^l9, m all of which 
the -students were allowed to ignore th^ poscibility'-that different inpdt 
variables mig^it have equal^lues. ^or all four of these /roblems, ^ewer 
than half of the .attemptta solutiyns were graded correct/ and thiij/pro- 
portion would decrease sub8tant;i4lly with any increase/in the stringency 
of the Grit'erla. . , j ' 
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Table 7 

Il«l^be^ of Solutions TlE!a<.Were Correct Except for 
. " ■ 'Algebraic Errors (fii"st trials) 



Problem 
number 

L5-30 

l8-9 

L8-27 

L8-28 

L9-3 

■L9-^ 

Ilio-i2 ; 

- I . . : . - 

tlo-l9i 
tll-ll 

^lk3-29 - 
LI5-15 
L15--17'' 
LI5-I8 " 

Li5-21 

il6-4. .. - 
il6-6 
L23-7 

L2lt-ll ' 
L2^-8 

,L26-5. 

L2.9-19 

L32-5"' 



Number of solutions 
that were correct 
except for algebra 

• 15 



1^ 
1 
7 
5 
2 
0 

5 
0 

1 
0 
0 
0 

"2. 
1 
0 
3' 

3 
1 
1 
0 



Number of solutions* 
that, were correct or 
correct^ except for. algebra. 

ft, 

33 

* 27 

■ 25 ' 
X , 20 ^ . 

32 ■ • 

* • 22 . 
33' , 

21. 

16.' ■ 

11 ■ 

20 ■ » ■ . 

26 / . 

. c?' 8 . 
12 

21 

■ h 
7 



Proportion" 
correct** 

.889 
.^92 
.711 
- -735 
.625 - 

.559 
.917 

' ,.667 ' ■ 

.600 
o .516 
'.61^3 

.690 

.839 
.310 
.500 
.52k ■ 

..Qk'd . 

.571 
.151^ ^ 
.269 
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( continued) 



Table 7 (cont'd) 



Hiimber of solutions 
Problem that were correct 
HTimber except for algebra 



Number of solutions* 
that were correct or Propoirtioij 
correct except for algebra correct**' 



L32-8 
L32-19 



0 
0 



10. 

6 




V 

*Sum of the niimber of solutions that vere correct except for algelDraic 
errors t^aken from the preceding column) and the number of solutions 
that vere completely correct, . 

•JHfThe proportibxi correct for each problem is calculated bjr the fomula: 

total number of 'sblutionc that ve re correct or correct except for 
' algebra - tnumbei* of students^vho attempted the problems. 
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CHAPTER V 

Errors ' 

The primary reason for" undertaking an analysis of errors was to 

* ■ ' *^ ■ 

provide variety in the means of measuring problem difficulty. In gddi- • 
tion. a study of errors provides insights into student ^erfoimance not 
'obtained from examinations of proportion correct -ahd distribution of ., 
correct solutions. . 

We considered mainly overt errors; errors of omission are not 
examined, except fgr failure *to provide for program output. A student 
may fail to solve a problem correctly, but tit the same time^m^e no overt 
errors. If his work is corract but not complete, the student is not - ^ 
credited with either a correct solution or any overt errors. Even more 
dramatidally, a student may produce a large number of commands that are 
correct in the sense that they contain no. errors and yet may not be 
credited with any commands that contribute to a correct solution oecause 
his" work has no identifiable^ relation to the problem he is suppocedly 
solving. There were a number of such instances in the data. In some 
cases, the student .was clearly working -on a completely unrelated program, 
e.g., on a previous problem, perhaps, or even one of his own choosing 
(one student spent considerable tiide writing a game-playing program that 
had no relation to any of the problems in the -course). 

In this chapter we use two methods to. derive statistics. In the 
first method, all errors regardless of their source are classified by 
type, and in the second, we show for each problem the number of students 
who 'made errors, though not .the number of errors made by each student. 
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There were.1090 errors in the 7063 conunands typed by students, with 
some .commands • coBtainirig more than one error. . Of the 3.090 errors, 7^+0 
(68^). were syntax ei^ors j^nd 350 (32^) 'were semantic errors. Note that 
because we are concerned only with overt errors the proportion of syn- 
tactic errors is probably overestimated. This methoS is more likely to 
fail to count semantic ^than syntactic 'errors . 

Syntax errors were divided into seven major classes containing 22 - 

»> 

sub-classes, and the distribution of errors into these classes is shown 
in Table 8. Fomat errors, which accounted for of the total, are of 

four types. ^ 

1. Line too long (3.0^). AID commands must be contained within 
.lines of 72 characters or less. If^a typed line exceeds 72 characjlters, 

an error message is given by 'the interpreter. 

2. • Omitted space (5.9^). One or more spaces are required as de- 
limiters after step numbers, after verbs, on both sides of IF, AS, -FOR, 
e t c • 

3. Inserted space (1.6^). Spaces are not allowed before the left 
pareni:hesis in expre ss ions - like F(X) and F(3), where F is either a user- 
(Jefined function or a standard AID function; all of the observed errors 
Vere of this type. Nor are spaces allowed before the left parenthesis 
in expressions like X(2) and L(l,i+) where X and L are indexed variables, 
but there were no. occurrences of this error. 

1+. Visible delimiter errors (1.5^)- Vis^ible delimiters such as 
commas and semicolons are required in specific commands. Some of the 
, errors in this subclass were errors of omission and others were errors 
of substitution. 
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Table 8 

Classification of Syntax Errcjrs 



Classification 

!• Errors in rormsTt' . *^ ^ 

A. Line ^oo long . '\ 

B. Omitted space 

^ C. Inserted space 
D. Delimiter error 
Total 

II. Transient Errors 

A* Typographical error 

B. Probable typographical error 

C. Incomplete command 

Total 

III. Errors in Ver|:s 

A. Omitted verb 

1. SET 

2. Other 

Total 

B. "^ OT DEMAND used direct ly 

C. Incorrect ^rt^-rh 

' ■ K. 

^ Total 



Number of Percent of 
errors total errors 



22 
W 
12 
11 

■89 

80 
36 

137 
253 



'33 

'38 
30 
_5 
73 



3.0 
5.9 
1.6 

1.3 
12.0 



10.8 
i+.8 
18.5 
.3ii.l 



O.T 
5.2 

0-7 
9.9 



(6ontinued) 
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Classification 



Table 8 (contM) 
9 ' 
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IV. Errors in Arguments of Verbs 

A. Equation. used in TYPE command 

B. Omitted "STEP" or "PAET" 

C. Errors in algebraic expressions 

1. Unmatched parentheses or 

absolute value signs 

2. Other 

Total 

. D. Omitted quotation marks . 
Total 

Y^. Errors in Multiple Form oC^Argument 

A. Used X(l,2,3) for X(l) ,X(2) ,X(3) 

' \ ■ ' 

B. Used DEMAND or SET with multiple 

argument 

C. Omitted second occui^rence of 

"PART" or "STEP" 

Total 

VI. Errors in Modifiers 

A. Misplaced. IF clause 

B. Ertor in logical expression 

C. Modifier used with wrong verb 

' 1. FOR with^TYPJS 
2. Other 

Total 
• 53 

! 60 



Number, of 
errors 



11 

2h 



15 

20 

35 
_7 

77 

-8 
23 

38 

6 
7 

16 
_7 

23 



Percent of 
total errors 



1.5 



2.0 
2.7 

1.0 
10. k 



1.1 

, 3.1 

1.0 
5.2 

0.8 
1.0 

2.1 
1.0 

3.1 

(continued) 



Table 8 (cont'd) 



Classification / 

D. Used FOR with more than 
one variable 



Total 



VII. Miscellaneous 

. TOTAL 



Niomber of Percent of 
errors total errors 



Jit 
70 

ll+O 
7^+0 



9.5 

■ 18.9 
100.0 
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The^next three subclasses are contained in the class called 
transient errors'. These are typographical, and related errors, and 
accounted for 3^.1^ of all syntax errCrs. .. 

5. Typographical errors (10.8^). A strict criterion was used for 
this subclass. The eri^brs include (a) doubling of letters (PARRT for 
PART), but not doubling of nonalphabetic characters, (b) omitting of 
letters (DELT5 for DELETE), but only for words containing at least three 
other letters in correct sequence so that the .word could be identified 
unambiguously, ( c) substituting L for 1, and (d) substituting any;, char- 
acter for another character with an adjacent keyboard position, provided 
that both characters were not digits and that no other similar substitu- 
tion resulted in an identifiable expression with a different semantic 
value (for example, FO can b^ taken as a typographic substitution for 
DO or for TO since F is adjacent to both T and D on the keyboard; hence, 
this error was not classified as a typographical error). 

6. Probable typographical errors {k.&f>). This class includes the 
typographical errors that did not satisfy the above criterion. Cate- 
gorizing these errors was guided in' part by the student's subsequent 
action. . For example, if q student typed a line like 

FO PART 7 

and immediately replaced it (before execution) by 

3.15 TO PART 7 

the error was included here. 

7. Incomplete commands (l8.5^). If a line is an initial segment 
of some correct AID command, it was counted as an error in this class. 

■5?- 

In most instances, it appeared that the student changed his mind in 
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midstream an(J, rather than ^ type an .erase command to erase the line, he 

typed the KETUpN key and then retyped the command making the desired 

* •* . ^ 

correction. . If these errors are considered momentary aberrations in the • 

same sense; as typographical errors, rather^ than as semantic errors, they 

account for 3^^^ of all syntax errors. Together with subclasses (l) and 

(2), which are also transient errors, the total reaches kjfo^ a substantial 

portion of the syntax errors. 

Errors in verbs, which- constitute 9.9^ of the syntax errors, are 

divided into three subclasses: 

c ■■ ■ • ' 

8. Omitted verbs (5,2^).^ Because this, occurred much more fre- , 

quen-y,y for SET than for all other verbs, this subclass was subdivided 

to emphasize that difference. Of the 5.2^ of the syntax errors that are 

due to omitted verbs, if. 5^ are omissions of SET. The SET command, unlike 

any other AID command, can be given without the verb but only when used 

directly. To illustrate this distinction, 

X = 7 . 
may be used in place of , . 

SET X = 7, 

but 

3.5 X - 7 

cannot replace 

3.5 SET X = 7. . 
We shall see other evidence of such logical •overgeneralization' again 
in this discussion. ^ . 

.9. TO or DEMAND used directly {k.Cf^). Of the commands taught in 
the first 32 lessons of the AID course, TO and DEMAND are the only two 
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,±hat cannot be used directly. Although the error message is specific 
and unainbiguous- (D0N'T*GIVE THIS p'OMMAUp- DI-RECTLY) , there were 30 such^ ^ 
errors^ .'A reasonable explanation for many;of these errors is. that the 
Student 'forgot to type a step niimbfer; ^his explanation is^^sl^ported by 
the fact that 'the errors occurred jiore frequently^J:!! .the first step of 
a program (with* DEMAND) than elsewhere, givir>^ one the impression that , 
the student's concen1:ration on. the semantic structure of the program was • 
sufficiently intense t* preclude mipor syntactic considerations. This 
kind of error is related* to errors in the first subclass of semantic 
eri^ors, and is mentioned again when those errors are diacusged. 

10. Incorrect verbs (O.?^). We expected that there would be more 
errors due to incorrect verbs than the five found in the data. The 
incorrect verbs found include DELeTE for nrSCARD, PRINT for TYPE, etc 
There was no evidence of misspellings other than typographical errors, 
which 'are not in this class. % ' 

The fourth category of syntax errors includes those made in argu- 
ments of verbs. ^^These^ for 10. of the .syntax errors, somewhat 
more than the 9..^' for errors in verbs, and less than either^ the 12.0^ 
for format errors or the 3^.1^ for transient errors^ 

11. Equation used as 'arguments for TYPE (1.57^). Technically, a 
command like ' ^ 

TYPE Y = 2 ^ X 

is not a .syntax 'error, since logical expressions can be used as arguments 
'for TYPE (and will return jeither TRUE or FALSE). However, this 'form of 
the TYPE commaM was-fiot taught in the first 32 lessons, and other 
evidence of t^e data indicates that students were incorrectly using 
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this command as a combination of 

SET Y = 2 * X ' * . 

TYPE Y. . / 

All such instances caused an error message that an undefioied variable 
had vbeen used, so that if one takes a more rigid view of the classifica- 
tion scheme, these errors should be gr6uped into the first subclass of 
semantic errors. We felt that , to do so would be rafisleading even though 
correct. * / / 

12. Omitted STEP or PART (3.2^).' Som^' exara.ples of these errorj 
are : * 

DO 3.1 for ;dO STEP 3-1 y/. 
TO If for TO PART k ' ' 

. DELETE 5 for DELETE PART 5* 
* (Since a command like DO 3.1 cannot be interpreted as other than D^/sTEP 
3-1, 9ne might reasonably fault the interpreter rather than the i^tudent. 
The same complaint can be made about many ether errors der^cribod here.) 

13. Errors In algebraic expreasiori Only teyritactic errors 

are included here. Semar.tjc errors in algebt*aic oxpr^-'S!-: Lr.riK tire dltjpusaed 

later. Nearly half (15 i^ut .;f 35) of thefie V-rrorj wepe In grtjuping^ de^ 

scribed in Tables b Qc: 'urunatched parenthetek Qr absolute value iUgnts*. 

ft 

In many cases these errors had more the appearance of typographical than 
of conceptual errors, as in • 

^ TYPE F(3.5))- 

Among the other errors in algebraic expression, one that occurred several 
times was the cmlGsion of the ^multiplication tjymbol 
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Ik. /(Citt^d quotation marks (l.O^). Because of numerous use-mention 

A / ■ / 

errorVthat occ^urred during pilot testing of the AID course, we expected 



a higher rate 'of these errors. In fact, students used text strings more 
often and ma^'e fewer errors than we expected. 

' The nex/t three subclasses of syntactic errors also occurred in argu- 
ments of verbs but they were errors in the foms of multiple arguments 
rather than single arg\iments. ' 

.15. pDed'x(l,2,3) for X(l) ,X(2) ,X(3) (l.l^) . Errors of;this kind 
usually occurred in TYPE commands : . ^ 

' ' - TYPis F( 10,20,30) tor TYPE F{10),F(20),F(30). 

' / * A 

These er^jirs could have been classed witla delimiter errors, but were so 
diffei^nt from other delimiter errors thatXthey were pi^t into a separate' 

ejass. * . . * . 

1$. J/Us^d "h^^B or SET with mtfltiple arguments (3.1^)-^ The only 

two AID VerbsXhat allow multiple arguments are TYPE and DELETE, so 

y 

commdndj^-'^ike 

SET X ^ 1,2,3 . 

\ . . ; * 

and , ' ^ 

DEMAND X,X,Z» 

' •» ' ^ 

are in' error. Apparently, the students overgeneralized the rule tfiat 

,» ■ *> \t / 

allows multiple arguments and produced tfiese reasonable but erroneous 

commands. , • . . 

]^7. Omitted second occurrence of PART or STEP (l.O^). Some 
eicampies of these errots are: 

TYPE PA^^T 2i3,^ / 

: I - ■ - " . ^ ^ ■ 

,/ ^ DEUSTE STEP 2.1,^.15,2.2 

•59 

■ 66 ■ >' 



Mso grouped with thes"e are similar commands in which the words jPART^or 
STEP were pluralized: . • 

' . TYPE STEPS 1.2^ 1.3, 1-^ ' ' 

. DELETE PARTS 10,11,12. ... 
The next four subclasses contain, errors found 'in AID modifiers. 

18. Misplfloed IF clflUBe (0.8?t). ThJ^ kind of error, whiCh occurred 
rarely, was caused by a* transposition >^ the main clause and the copdi- 
tional clause: 

3.7 IF X > Y TYPE X, 
This order of clauseig is used in many other 'progr^amuin^ i^^^pguages 

19. Error in logical expression ^l.O^^lj^^-Several of these errors 
resulted from attempts to use commac to 'indicate crmjunctions : 

TYPE ^AJ[JUA-<'^,C,D.. 

20. Modifier used \fi\){jiv^t)jg^^ The most common of ^ 
tHese errors (16 out of 23) resulted from an attempt to ^se FOR as a' 
modifier of TYPE:' 

TYPE 3^Xt( V2) FOR X - 1,2, 3, U. 
Since FOR can bev used only with DO, this resulted in an error message. 
Other instances of errors in this claan are the use of TIMES at^ a 
modifier for DEMAND, the use of AS as a modifier TYPE, etc.: 

DEMAND X, 3 TIMES 

TYPE R as^ "RADIUS". ' . - ' 

' 21- Used FOR with more than one variable (1i.6^). Most ojccurrences 
of this error were for Problem L15-15^ and reflect an omission in the' 
curriculum. The problem acked for a program that would type the l^arger 
of two numbers, and the partial model shown -in the problem did not 
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include any cXues to how the input of two verifies could be- managed. 
The students had littm^pr^oua/^xperience^with multiple input but had 
used'TOR on many occfiimis tcAnAxt peveral values for a single variable. 
Some examples of the' corans^ds produced by students are 

. <.^^--^^^ ' v . • 

- • DO PART 2 fOE X = 1; V.t 2 ' ' 

; DO PART 2 FOR '(X,!) = (1,2). 

Students tended' to persist in these errors, typing the same command again 
or a similar, one, even after receiving an error. message. With a wami;xg 
that FOR cannot be used with more than- one variable, 'these errors could 
probably have been avoided entirely ^ ' (Such a change was made in a sub- ^ 
sequent jrevision of the curriculum.) 

22. Miscellaneous (18.9^). The errors classed as miscellaneous 
are too ..varied to be simply characterized. However, a large portion of 
these are' probably typographical and are of a transient nature (i^.e., 
, many of the errors Were Corrected before execution or after an error ■ 
message was given). .A few of ' these erl-ors were . caused .by attempts to ^ 
upe text strings as one of several arguments for a TYPE command: 

, - TYPE "THE AREA IS", A ' 

This error was caused by a bug in the interpreter and no warning about 
it wgs given ih the lessons. 

The second main group c5^ errors, the semantic errors, _ are also ^ 
■ divided into claPRPB and subclasses— in this, case,, 9 classes contaihl,ng 
16 Bubclasaes. The distribution of semantic errors is shown in Table 9. 
Of the 1080 errors analyzed, 350 (32^) were demantic. 'Each nubclass is 
.described individually^ The first four subclasses are in the class 
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Table 9 , 
Classification of Semantic Errors 



Classification ' 

^ • . 

I. .Jlrrors^in Use of . Variables 

A. Real • 

B. Function's 

C. Step or Part 

D. Other 

- . Total . 

ji ■ . , - 

IJ. Algebraic E^rrors- 

A. Omitted parentheses"^' 

B. Incorrect operator 

''C. Other 
\ ^ Total 

III. Er:^brs i^ Logic 

' ' \ 

A. Logical expressions 

0 

B. Sequence of- execution 

\ ' 

C. 1 Other 
■ Total 

, IV. Er2^ors in Use of Dumiriy Vari:ables 

V. Confusion between LET aha SET 

VT.- Confusion between STEP jknd PART 

VII. Numerical Error 

VTli. No Provision for Output 

IX. ' Miscellaneous 

TOTAL 



Number ' 
of errors 



Mo6 
9 

20 

a.- 



Ihh 



..X 



6 

5 

66 
97 




1^ 
16 

_2 
32 

13 

6 
8 

11 
6 

33 
350 



Percent 'of 
tottal errors ' 



I^.O 

k.G 

' V 0.6 
9.2 

3.7 

1.7 

2.3 

3.1 

1.7 

loo.'o 
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titled "Errors in Use of Variables." The errors in the first three subr^;^ 
cipss'ds are errors that is, errors caused by attempts to 

use undefined variables, modified functions, etc. Errors in the use of 
dummy variables are not included here but in Subclass 11. >litOgether, 
errprs in-the us^ of variables, excluding dummy variables, acqoun1;:ed for*^ 

' • 

1+1.2^ of the semantic errors* . " / • . 

1. Real variables (30.3^). These errors resulted fr©iif' attempts to 
execute a command containing undefined real variables, unindexed or in- 

"dexed, in an, algebraic expression. A large number of these errors were 
caused by the inadvertent .omission of "T^ t ep nmber, which caused the' 
commands to be executed immediately, rather than stored and executed 
later as ir^tended. Thus, some of the errors in- this class, are closely ^ 
related to some of the errors in Syntax Subclass 9. 

2. EunctioHs (^.6fo). These Errors resulted from attempts to use 
.. .. ^ ■ , * " ' 

.undefined functions. In several cases students used one n_ame for the 

function when defining it and inadvertently used another in a^later 

function call. Some of these errors, however, indicate a deeper con- 

« ' * ■ ■ ■ ^ 

ceptual misunderstanding; errors in which the name of the diunmy variable 

was ilsed as the function nar^e in* the function call are of this type. 

3. *Step or part (5-7?^). These errors jDccurred when students . 
attempted to execute, list, or delete an undefined step or part 

h. Other errors in the use of variables ,(2. 6?i) . Most of these , 
occurred with indexed variables. 

All of the errors listed above caused error messages, and are thus 

-J . ■ ' 

closely related to the syntax errors. Also^ like most syntax ^rrors, 
most of these errors were immediately corrected' by the students and • 

. 63 ■ ^ ' . 



are not indicative of any serious misconceptions. The remaining seman- 

. / ■ . \ 

errors ate, f6r the most part, evideno^ of more fundamental mi€under- 
s'liandings . ' \ .\ 

The next three subclasses ' contain alg^br^aic errors, other than syntax^ 

•.,/.- . . ■ ^ . A ■ 

errol!»s. Tf. •. , ■ - ; 

' . ' ■ ■ ' . — • •■ \ " ' " \ 

5. Omitted parentheses (1.7^).' Most of these errors occJurred in 

I ' ' . - ■ fif'- *" ■, . 

problem L12-)i, where students u^ed expressions like A + B + C/3 to f ind 

/ " "'^ . 

the average of three numbers. ' 

6. Incorrect ope'rator'^^Y-l^)* Most of these errors occurred in 
•Vproblem L5-30, where students used expressions like 6.9^.3937 instead of 

the correct 6. 9/. 3937- . • 

7. Other algebraic errors (18.9^). This is the second most numerous 
subclass of the semantic errors,, ar>d the errors were varied. Many were 
the result j®f incorrect translations "^of algebraic expression& into AJF 
notation ^d many others were the result of incorrect ^expressions that 
were correctly translated. The two problems with the most errors in 

this subclass were the' only two problems that required the use of the 
standard AID function IP (integer part). 

The next three subclasses contain errors in logic, which accounted 
/ for 9-2'j^> of the semantic errors. 

8. Errors in lo^gical "expression^ (i+.O^.}. These errors in forming 
conditional clauses occurred most frequently for Problem L2U-11, which 
asked for a program to count from 1 to N, and the most common ei^ror was 
the ule of < for which caused the program to count to N - 1 rather 
than W 
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9- Errors lA sequence of execution (H. 6^). There were fewer eiirors 
in sequence of .execution than we anticipated; however, errors caused^^ by 
the complete omission of a branching command were not tabulated. One'. 




error that occurred! several times was ofJthis form: ^ * 

\l.l DEMAND X I - 

.2 TYEE F(X) ^ \ 

l\3 DO PART 1 \ . 

' * \ 1 

If TO had been used ill place of DO,, this^rograra would have been con- 
sidered correct.. As it is, it functions correctly for a large number 
of iterati'^ns (sufficient for most purposes) and fails only when the 
capacity of the"^*sh-do-Jri stack is exceeded. Since students were taught 
nothing, about this feature of the interpreter, ahd since this «prog ram 
functioned ^correctly from! the student's- point of view, it might, have 
been better not to have considered the DO command in error. 
10. . Otb^v errors In DLpgic (0.6^) . 
'. 11. Errors in Aise ofUummy variables (3-7^,) • Se^veral of these _ 
errors occurr^when the student changed the name of the dummy variable 
in the mi'ddle "of a LET coirnnjand : ■ 

LET T'(X) =^3.li^*Rt2. . ^ ■■ • , 

All of the errors with dummy variables indicated a serious conceptual 
difficulty, which the curriculum .did little to,:^lspel. , 

12. Confusion between LET and SET (l.?^)' As V-tule, LET and SET 
cannot be interchanged, and certainly not in the ways they were used in 
the' lessons. There were, however, several instances where' LET was used^ 
correctly, but a SET coTrmnd would have been preferable. These di^ not 
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count as errors^ although it is lijcely that these students ^ere cbnfused 
about the dif feifence between LET and SET# 

13. Co^fu&rono between STEP and ^{^-{2.^^). An example of 'the 
errgrs in this subclass is ' * ^ . • 

\ ^ ' CO PART 3.25. , 

llf. Numerical, errors (3.1'3i). Several numerical errors were prob- 
ably typographical in nature. " . ^ . 

15. No provision for output (n^-.f^). Again, more errors were 
expectfed than dccurred. ^ ' 

,16. Miscellaneous (9.!^^). . ' , . ^ - 4.^ 

The error analysis was undertaken more in the ^ppirit of demonstra- 
ting a method of error analysis for use with similar data than as a 
definitisye study of the kintjs of errors students^ make in learning to 
program. With only 25 problems and 40 students, the data' are too Sparse 
to .warrant viewing the statistics as more i;han indications of tendencies. 

Some tendencies are clearly indicated, however. Typographical errors 
accounted for the largest part of the syntax errors, and reference errors 
(Subclasses 1, 2, and 3 of the semantic errors) accounted for the largest 
portion Df the semantic errors. A sizable number of syntax errors are 
•reasonable' errors, that is, commands that could have been Interpreted 
correctly had the interpreter been prepared for them; most of these 
resulted from misapplying — or overextending — some existing syntactical^ 
rule. In the semantic errors the second most numerous subclass was jfhe 
class of algebraic errors, which partialj^ confirms a previous suspicion ^ 
"that students were more deficient in algebra than the ourriculu^ assumes. 
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and»that the curriculm does not devote enough time to teaching under- 
lying algebraic concepts. 

Several summary statistics are presented in Table 10 for individual 
problems. For each problem, the totals of both syntactic and semantic 
errors and the ratios of these to the number of students who attempted 
each problem are shown. Some of these derived statistics are used in 
the next chapter as measures of problem difficulty. 

; * in comparing the error rates foA different problems it is clear 
that Problems L32-5 and L32-8 are diffAcult both syntactically and 
semantically; these two problems also ajppeared among the most difficult 
by .^thef measures of difficulty. 

Tfie correlation between syntactic and semantic errors ois not high 
•'(-.'3l|) and the most 'etl-iking discrepancy iB for Problem L29-19 for which 
there were 70 syntax errors and- only 8 semantic errors. This problem 
is also one that was mentioned as extremely difficult by the criteria 
used in Chapter IV. 0 

The error analysis described above yielded some interesting results 
and pointed^^the way for future detailed studies of similar data,.* How- 
ever, the results may be misleading because in classifying and counting 
errors, we used the occurrences of errors ratl\er>ythan of the number of 
students who made, errors . One would expect some correlation between the 
two categories but it would be far from perfect for there were a number 
of cases- in which a sizable number of errors were made by only a rela- 
tively few students. This was /l^rticularly striking when a stud(|nt 
persisted in repeating an error many times even after receiving in1^^£- 
vening error messages. 
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Table 10 

Classification of Errors by Probiefi' 



Problem 
number 



Type of Error 



Syntax 



Semantic 



Number of 
errors 



Errors 
per student 



Number of 
errors 





±± » 


0 ^1 


T A Q 


JLU 


O P7 

J 














TOO 

L9-3 


^ do 


h At 


L9-0 


1 A 


u. p:5 


•JjJ.U- J.U 


Q 


n per 


T 'SI r\ TO 

Lio- xy 


Xd. 




Lii-ii 




/-V r-Q 

0. 50 


L12-I^ 


18 


0.51. 


LI3-29 


22 


" -0.63 


L15-15' 


• 61 


1.57. 


LI5-I7 


36 


1.29 


Ll^-18 




0.93 


LI5-21 


.- 32 


1.10 


Ll6-k 


36 


1.16 


L16-16 




1.96 


L23-7 


23 ' 


0.96 


L2lt-ll 


21 


i.cSJ^ 


L25-8 


11 




L26-5 


35 


1.25 


129-19- 


70 


2.69 


L32-5 


75 


2.88 



Errors 
per student 




68 
75 



Table 10 (cont'd) 



Type of Erj^or 



Syntax 



Semantic 



Number of 
errors 



Errors 
per student 



Nvimber of 
errors 



Errors 
per student 



51 ^ 2.22 25 

- 21 1-50 7 

ikO 1.07 , • \ 350 

(average erroj's, 
per student 
per problem) 



r.09 
0.50 

0.1^9 

(average errors 
per student 
per problem) 
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CHAPTER VI • 
Problem Difficulty 

In the last three chapters several measures of problem difficulty 

o • . ^ 

were discussed: the number of typed commands that contribute towaM a 
correct solution, the proportion of students who produced. correct solu- 
tions, the nvimber of syntax errors, etc. In this chapter these and 
other measures are compared, and an ^ttempt is made to account for the 
, variance in problem difficulty. 

0 • 

ft ■ ' . • 

Nineteen measures are defined below. All are measures^ of qualities 
presumed to.be related in some way to problem difficulty. Some, like 
the rate of syntax errors, vary directly with ^problem* difficulty, whereas 
others, like the proportion of students w^o produced correct solutions, 
vary inversely. The first three variables are measures of proportion 
correct, and the e^tatistics are derived from those discussed in fchapter 
IV (Distribution of Correct Solutions). The next ten measures are based 
on errors;' the valuis of these variaT^les are found from the statistics 
discussed in Chapter V (Errors)^ There are five measures of the effort 



expended, using utatii|ti6G from Chaptero III and IV. The final meaoure, 
^the propor1:,ion of r.tud^^nto who attempted the problem, ii; evaluated from 
statistics given in Chapter IT. The 19 measure^ are described below and 
the values for each problem are given in Table 11* 

Proportion Correct . All three measures of proportion correct are 
ratios of the xiWber/of students who gave correct solutions to the number 
of students who attempted the problem. This definition of proportion 
correct Is oome>what differe.nt from the definitions us,ed by others. 
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Values 



^ Table 11 ^ 
tor Measures Related to Problem Difficulty 



*Prob. 
















MB' 






No. 


MJ. 


MO ■ 








'm6 


M7 


M9 

* *✓ 


MID^' 


L5-30 


'.1+72 


.889 


.972 


.396 


.639 


* -9^ ..... 


..082 


.172 


.251+ 


2'. 091 


L8-9 


.838 


.892 


.573 


.270 


.21+3 


.51H 


•.osT""" 


Te7« — . 


^60 


.900 " 


14-27 


.^58 


.711 


.763 


.553 


.316 


.868 ' 


.122 


.070 


.192~~" 


-.571 


L8-28 


.618 


.735 


.821+ 


.676 


.1+1+1 


1.118 ■ 


.125 


.082 • 


.207 


.652\ 


L9-3 


.59^ 


'..'625 \ 


.906 


.813 


.563 


1.375 


.111+ . 


.079 


.193 


.692 


L9-8 


■m 


-.559 


.559 


t 

.529 


.588 


1.118 ' 


.098 


.109 


.208 


l.lll 


LiO-12 


.7Y8 


.917 


.91+1+ 


.083 


.361 


.i+ifi+ 


.0^ 


.123 


.151 


It. 333 


LI0-I9 


.857 


,.911+ ■ 


• m . 


.3^3 


.250 


.5H3. 




061 


.16$ 


.583 


Lll-11 


.667 


.667 


.788 _ 


.576 


.273- 


.81+6' 


.P75 


.036 


.111 


'.I+7I+ 


L12-U 


.800 


.9^3 


.571 


.511+ 


.600 


I.III+ 


..066 


.077 


* 

^11+3 


1.167 


' L4.3-29 


.600 


.600 


.600 


.629 " 


.1+86 


1.111+ 


.095, 


.071+ 


.169 ' 


.773 


■LI5-I5 


.1+81+ 


.516 


,613 


■1.968. 


■ .355 


2.323 ' 


.176 


.032 


.207 


.180 


LI5-I7 


.61+3 


.61+3 


.679 


1>286 


.393 


1-679 


.097 


.029 


.}26 


.306. 


LI5-I8 


.1+07 


.1+07 


.1+07 


.926 • 


.k&j 


l."333 " 


.102 


.01+5 


.11+7 


.1+1+0 


LI5-2I 


.690 


■ .690 


.759 ' 


1.103 


.310 


l;i+ii+^ 


' .071+ 


..021 


.09^ 


:2fill 


LI6-U 


.77^ 


.539 


.806 


1.161 


.323 


4.1+81+ 


.122 


.03^ 


.156 


.278 


LI6-6 


.276 


.310 


.31+5 


1.862 


.276 


2.138. 


.082 


\012 


.095 


.11+8 




. 500 


.500 


.667 


.958 


.292 


1.25D 


.067; ' 


.021 


.088 


.301+ 


L2U-11 ■ 


.381 


.521+ 


.381 


i.boo 


.905 


1.905 


..095 


.086 


.180 . 


.905 


L25.-a 


.720 


.81+0 


.720 


.1+1+0 


■ .1+80 


.920 


.088 




. .181+ 


1.0^1 


L26-5 


.536 


.571 


.607. 


1.256 


.679 


1.929- 


.121+ 


.067 


.191. 


.51+3. 


L29-I9 


.115 


.151+ - 


.115 


2.692 


.115 


2.808 


.190- 


^.008 


.198 


.oJ>3 


L32-5 


.269 


.269 


.269 


2.885 


I.15I+ 


U.038' 


.121+ 


.01+^ 


.ir3 


. .1+00 


L32^8 


..1+35 


' .1+35 


.500 


2.217 


1.087 


.3.301+ 


.09^ 


."01+6 . 


.11+0 


.1+90 


L32-I9 


.1+29' 


."1+29 


1.500' 


.500 


2.000 

t 


.092 


.031 


.122 - 


"333 


Mean 




.623 . 


.663 


1.06 


.1+8 ■ 


I.5H 


.101 


.061 


.162 ^ 


.76 


"S.D. 


.19*+ 


.218 


.21+2 


.7.5 


.26 


.87 


.033 


.038 


.oi+i 


.86 




( continued) 
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Table 11 (contM) 




Commonly, the denominator of this patip is* taken to be the number of 

J * I * 

Students who encountered th^ problem, so that a failure to respond is • 
equivalent, ito an incorrect response; this -definition is u^ed whenever 

all students are .expfe^cted to respond to every ejt^rpise p;resented to them, 

* ■ , '/ • ' « 

Since the AID coUrse allowed students to omit problems without penalty, 

we chose the definition using as a divisor" the rnpiber' o^f students who - 

actually at^^empted. to solve ^e problefm. The- three measures/of propor- , 

tion correct are a&> follows • ' - 

* ^ Ml:, Pi-oporyion torrect on First Trial- -(niimter of ^students, who gave 

a correct solution: on th6 firs"^ trial) -f; ( niimber pf - students who attempted 

■ *, . s • 

tbe 'problem^:^ Ml .i^^taken as 'the primary iriteasyre of problem di<^f ficulty- 

•This varia'ble "was discussed in Chapter IV.'" Its value granges from 11.5'^ 

^for problem L29rl9 fo 8^. 7^ for LlO-19. The mean is 55'^ for set 

.■ . * :-■ "■ ■ ■ .V . 

'of 25 pfei1>lems.- . " • •• 

' " V .jyi2r proportion 'Correct on First Trial Disregarding Algebraic Errors-- 
(flumber.-of stude'n-ts whose solution on flrlt trial^was^ correct except, for - 
possible, algebraic errors)' -f- (number of .students" who attempted the problem) 

•For several problems, e'.g., L5-30, L9-8, L12-U, students "used an algebraic, 
formula that" was incorrect, -for instance, "x*. 3937 for x/.393T, although 
in'.ali other respects the solution wgs 'correct. As pointed out in ,| 
Chapter IV, disregarding errors in algebraic formulas increased the pro- 
portion correct substantially for. some problems, with_ increases ranging 
up to nearly 100^ (for L5-30) . Th^ mear> of M2 is 62.3^/ compared to 
55.6^ for Ml, 'and the standard deviation Is '21.8^. Pursuing the compar- . 

'-ison further, we find the correlation between Ml and M2 to be quite/trigh 
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(r = .90), as shovn in the correlation matrix for.'the measures of problem 

difficulty (Table 12).' 

M3: Proportioil Correction All Trials--{ number of stude^nts who 

achieve(J a correct solution on some trial) — (number of students vhp 

attempted the problem).- Had the teaching program consistently asked 

students to make another try if their first was not successful, M3 j/ould 

be^p better measure of problem difficulty. ^Since this was not done and 

since the amount of help offered varied considerably, this measure is not 

as satisfactory as either Ml or M2. The mean of M3 is higher than for 

either. Ml or M2 (66.3^ compared to 55-6^ an^ 62.35(1) and the standard 
A" • • - • ' * 

deviation (24>2^) is allso higher. M3« corxfela^es slightly better with 

' - - 6' ' , ■ ' 

M2 than with Mi;(.9U vs. .,88) although Ml, like"M3, is based on completely 

correct solutlojis. This evidence ir/ltself is not convincing but. coupled 

with a closer study of ttte subsequent actions of students who made simple 

algebraic errors on their first* try, it supports the Inference that the 

teaching: program is re^asonably adept at detecting such errors, and af^fers 

effective assistance to the students who made .them. 

* ^ ■ " / ^\ 

Number/ of Errors. The vkhree mie^sures of number of errors are all 
' ' . ' . . _^ 

averages for the students who attempted the problem. ■ ' 

M^^': Nomber of SyntJi!)c Errors per Student*- -(number of syntax errors-) . 
(number of students, who attempted the . problem) . The average number of 
gyn^ax errors ranges from' *08- for Problem LlO-12 to 2.88 for Prqblem 
L32-5, with a meap of l.*06 (S.D. .75) * One would expect errors^ to 
correlate negatively with proportion correct; this is* true, and the • 
correlation coefficients are all quite large: - - -T^, r^ v - .83, 

and r , r: - .80. The correlation with M2, which disregards algebraic 

' ' 7k ^ . ^ 
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errors/is cp,i;^a^^r^bly- higher than the c^^^elation witTi yi^, lending 
support to the not5X)n that M2 may be^-df better measure of programing 
difficulty than 

. M5: Number pf Semantic "^rrors per Student — (n\imber of semantic 
errors) — . (nymber of students vho attempted the problem). The average 
number of semantic errors (>^8) is 'less than half the average number of 
syntax errors (I.06) and the range of values is also smaller (.11 to .1.15 
with S.^D. = ;26). Furthenaore, the correlation ^between syntax and'" semantic 
errors is quite ^lov (r^ ^ = *32) . The correlations between seman;tic 

rrors and the three measures - of proportion cprrect are also quite low, \ 
■between - and -^32/ and semantic^e rrors , unlike syntax errors, corre- 
late bett^r-with Ml than with M2, as would be expected since a large 
number of. semantic errors are taken out by M2." ' ' 

I m6 : Number 'of all Errors per Student--M4 + -The' average number 

6^^^1 errors is composed *of about two-thirds syntax errors and one-third 
semantic errors^ with a mean of 1.-5^+ ^rors per problem. The correlation a- 
of M6 w/'th sjmtax errors is remarkably higt^; r^^ ^ - .96, as compared to * ^• 



th^ correlation with semantic errors. The correlations of 



M6 with the thre^ measures, of proportion correct follows -the ^same" patterr^, 
.as MU/ the^ number Af syntax errors.; again, ..there is a slightly higher 
•correlation with M2* ^an> with Ml (see. Table 12). ' ' /-v 

Error Rates. Sino^ the total nuimber of errors may be dependent upon 

>. . * • ' ' ^ 

the number' of commands grven ^by the '^tuSent, three measures of error 

* \ • ^ 

rates^ were also defined, 6oVrespondirig to ^he three nleasures M4, M5, and . 

M6. For each of these., t]:ie|mmber of errors is divided by the number Of 

commands typed. 
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■ MtV'.- Syntax Error Rate --(number of syntax errors) -7 (niimber of 

- . ' „ • . . . \^ 

commands typed). The mean syntax error rate is 10^ vith a standard ^ 

deviation of 3.3^. Recall that,. the count of syntax errors- is a count of 

errors themselves and hot a count. of commands in error, so one cannot ^ 

infer that an e!rror rate of indicates \to.at one command in ten is in 

error but that,- in ten clommands, there is on t^fe-^erage one error. ^M7 

corresponds to y\h , the number of syntax errors, and tUe corre^lation is 

substaiitial but not spectacular (r = .59)- In making comparisons with '^^ 

proportion correct, ve f ind^that Nr7 fbllovs^tii^^smEe^attern as ^Uk but 

tfiat the correlations are mucfe lover; for exa%)le, r- "iQ-xmly -.kk 

■ 1 • -L^ ( : • - 

•^^hereas r 'is -.72. ' ^ a J'rediQtion of proportion correct, the nvimber 
of-syctax errorg- would serve much better than the rate of syntax errors^ 
accounting for 50^ of the.- variance as opposed- to 20^. 

■ m8: Semantic^rror Rate--( number-^, semantic errors) 17 (number of 
' commands typed). .As we would expect, the semantic error rate is Ibwer 

than .the syntax error rate (6^ vs. 10^). Further comparing these two 
measures*, we notice that the correlation is negative (r^ ^ ='-.3^), and 
that the correlation between the semantic*. error rate and^ th^ niinber -of 
syntax errors is also negative and has ' an even , higher value (rj^ g = -.63) • 
Furthermore, although all of the other error measures, Uk to MT, correlate 
negatively with proportion correct, as expected," the semantic error rat^e 
• correlates pos-itlvely wi,th all three measures of proportion correct, and 
•the values, though not high, ere. substantial ( .25, .58, and 150). _ We do 
noi! kJtjQw whether this phenomenon can be acbountetic.f or by, characteristics 
that^re peculiar to this set of problems . or curriculum, or whether it 
is likely to -be true' for other prograny^ng problems given in cither 

' ; • ■ ■ ■ ■ 

^ . I ; ■ . -84- ■ ■ 



circximstances. ' The evi4ence here is strong enough to warrant a closer 

study of other data. ^ * 

• M9: Error Rate — (number of all errors) 7 (number of coinmands typed/'|^J^ 

The .mean of M9 is .i6.2^j^ an average of one error for every six cqmrnands 

• ■ 

typed. M9 stands in the same relationship to M7 and M8 as M6 does to Mil 

and M5, and onh vould expect to find ^ a' similar pattern in the correlation 

, <^ 
matrix. The siiiailarities are few, hove ve:;', and one of the more noticeable 

variations is in the correlation with, syntax errors. As we. saw, M6, the 

niimber of errors, correlated extremely highly with M4,. the number of 

syntax errors (r = .96). In comparison the correlation between M9 ^and 

W[ is only' .h^. This correlations with semantic errors are. quite' com- 

parable: r = .57 and 5o q = ■65- Thus, the rate of all errors 

correlates better^with the rate of semantic- errors than with the rate of - 

~ ^ - . ■ J 

syntax- errors, whereas the opposite is true if w.e' measure the number of 
errors instead of the error rates. Althpugh there was a' fairly high - ^ 
correlation between M7 and Mi+ , and a lower but not insignificant corre- 
iation between M8 and M5, the correlation between M9 and M6 i^* ensentially 

•a 

nil (r = -.03). As a final comparison '.be tween niimber oi^'rrr^^rv aini error 

, • . ■ 

rates, consider the value of as k prediction) of proportion correct : 
not more than Tl^jo of the variance in proportion correct coulci bp accounted 
'for by the ^otal error rate; on the other hand, m6, the l<.^LaI^ number of. 
errors, could account for 50fo. ^ * 

I From^this .d.iscussion it is ' clear , that meas\ires->of errors, and in 
particular the ^ total error ratd^; measure^p'roblem diff'iculty along a 

'^(|ifferent^dimension than pi^oporfion correct. Although there'^io a very- 
high correlation between the n\mber o:^ syntax errors and proportion 
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correct, this may be because longer programs tend* to be mot^ difficult , 
and also' afford more opporturftti-^js for syntax errors; that this is not 
the whole story, however, is sho^p by the substantial correlation between . 
syntais erroi^s and proportion correct even after length is factored out 

. • • ■• i 

(r^ = -.1+8, for e:^ample). ! 

There are two more Tneagures of error rates' that may be of some . 
interest: , - ' . 

MIO: Ratio of Semantic Errors to Syntax Errors --(nvunber of semantic 
' ■ ■* - " 

errors) - (number of syntax errors). The mean of MIO is .76; on the 

averag^^there are three semantic errors for eveiy four syntax errors. 

The range of this variable is large, \0h to U.33, with .a standard de.vla- 

tion of .86. MIO is, as expected, negatively.^correlated with syntax " ' ' 

errors, both Mh and^M7., and positively correlated with M5 and MB, tlae 

two .measures of semantic errors. Except for ^^^^q/ these correlations^^, 

all have- a magnitude- of over .3r '^^^ correlation 'with M8,\the. 

''semantic error rate, is over The correlations with proportions , 

correb't have the same pattern^, and nearly the tTape values, as for MB. 

Mil: Ratio of Errors to the Length of the Expected Correct Solution— 

(nujnber of errors) ^ [(number ^of^ 9pnifnands- in the expected correct so^^tion) 

' ■ . • * . / 

X (number of students who attempted the* plrdblem)]. This variable has a^ 



mean of .37 and its-^correlation with the other mea&ures of/error ratg^f 

rs fairly low except for the number 'of semantic errors ( r^. . . = .60). 

• ^ - ' . \ * ^ 

Mil correlates negatively with' proportion correct, as^^ we expect of error 

■ / ' ' - 

'measures, but the values are low. 

The Number of Students Who Made Errors . For many problems, most of 
the errors, were made ^ by only a few'^ students . The fact that students 



s^ccippe-d different problems may cause some doubt about the reliability oT 
measures Mh to Mil. Trte last two error measures do not share this-, defect / 

M12: Number of Students Who Made Syntax Errors — (number of students 
who made syntax errors) -7 (number of students who attempted the problem). 
On ^the average k8.7io of the students made one or more syntax errors, and 

the range is from I3. 9^0 . for Problem LlO-12 to 85.'?^ for L32-I9. The 

. ■ ^ 

correlation with the average number of syntax errors is quite high 

(tj^.^P = .86) but. riot nearly so^high with the syntax error rate 

-i^ A-\o =^ -S?)* "Again we see evidence of the difference between syntax ' 

•§nd semantic errors: the correlation between the semantic error rate and 

number of/'s.£udents who made syntax errors is negative and quite > high 

(ro-; = -.67). The correlations of M12 with the three, measures of 
0,12 

proportion correct follow the ^ame pattern as for the other syntqix error 
meat:ures {Mh and MT), showing the greatet^t correlation with M2, the pro- 
portion correct discounting algebraic errors • Thece* three correlations 
are not quite ai\ hifrh for M^i but are cor!-.i/lerably bi(r,}^n^ trian f^.r M?. 

M13: NUiTber of ST.*j(].:::it^; Wluo Made Sr-rtan^^ic Krr/ >rt-r f Miirribf 
sdm^entt; v/ht, rna.ie .ernantic erron:) (number of .-.^ udct r ■ -'-t -r .-r-iptn^i 

the problem). On the avera/^-:- 3^.6^- of tho rtU(ient.r^ r;iadu ^ r morn 

semantlpe),errorb. ^T'he lf,^va^a^' It .^i ir. for PrMbL-r: 1^-'^ ^^n. ' 
high.gf 69^^ ic for L3iis5. Again the correlation Li' muv^h h'=!rh^*r witl. 



the number of errors than With tte?*error rate: r^ iHh, whereati 

ro ^- •^8* Ail we ha\^ come .lo( ^"^xpect , i here llttlt* ct.rn^iatii.jri 

wii*. .measu^:'. of i^yntax errors: r^^ - ^^)^]3 " ^^^ '^^^^^^ ' '"^^ 

The lowest of^alT'ctHrr^eO^ MI3 Ik " .o4,Vt>ne cQrrelation 

between MI3 and the. r^tio of semantic, to syntax orrors. Th^: corn^lations 



A 

of MI3 with the measures of proportion correct are negative but the values 
are not high (less than .U). 

If M12 and MI3 verq to be taken as. replacements for Mk and M5, we 
can see that not much would be gained; the correlations r^^ ^ and 
are both quite high. 

Leaving measured of errors, we turn now to measures of effort. ^ 

Effort Expended > Five measures of effort are defined, the last two 
of which are ratios, 

MlUr Number of. Commands Typed--(nt^^^ of commands typed) (number' 
of Students who attempted the problem). The number of commands typed has 
a wide range, from 2.9 for Problem LlO-12 to 23-5 for L32-8. The rftean of 
this variable is 10.2. We would expect this variable to vary directly 
with problem difficulty and hence -inversely with proportion correct; this 
expectation is b^^ out and ^the correlations with Ml, M2, and M3 are 
quite high (e.g., f^^^ - -.75)- Looking at the correlation with Ml+, we 
confirm the suspicion that the number of commandn and the number of syntax 
. errox'O are statistirally iependent (r - .86), and the correlation with 
syntax error rate iy correspondingly sAtlsfyingly low i^rj ^-^i^ - .16). 
Still lookirig at the co'^'relaticn vector for Hlk, we find tnat the cor- 
relation witl. tho •.■.ernaritic error rate fairly nigh but II- negative 

MI5: Number giLCommands Executed- -(number of commands executed) "7 
(n\imber of sCu^nts who attempt^ ^tf^ problem)'.^ The avei^e number of 
cWmands executed is T.U, about|thPe|^^e)urths of the com^fujT. typed and 
the' correlation with 'commands type^is extremely high'(r^^ •98)- 
The correlation vector for MI5 is qydlte similar to that for commands 



typed, including the high negative correlation with the semantic error 

rate (rg = -.6k) . ' 

Ml6: Commands Used 'in Partially Correct Solutions--(nvimber of 

commands used in correct or partially correct soluti-ohs) (•number of 

students who attempted the problem). The average nvimber of commands 

used in partially correct solution^ is k.S, as compared to 7.I+ executed 

commands and 10.2 typed commands. As mentioned in Chapter IV, fewer 

than half the typed commands contributed toward a correct solution. The 

correlations of MI6 with both commands typed and commands executed are 

quite high: r^ = .77 and = -85. The correlation vector for 

MI6 again follows the pattern set by Mlh and MI5 although the magnitudes 

are generally lower. Again we note that the correlation with the syntax 

* 

•error rate is low (-.08), and that the correlation with the semantic 

error rate is still negative, though somewhat less than for Mli* and MI5 

(-.56 as compared to -.65 and -.6h). Since ^it Is hard to believe that, 

as a general rule, tho rate of G^mantic errors dhclinen witv^ ihc' number 

of commandP;, it.seomf; likely that thic ronu],^ i;; cau;.ed hy unldent If led 

peculiarities, of. the i:ct of'problemr. or the curriculurr:. 

MI7: Ratio of Comman.ii. Unnd in Partially C(;rn;Ct: f.olut ioni. tu 

Commands Typed- -( nuni be? r of commando uced ir^ cr>rn?ct or p'^ri-ially c^-rrect 

solutions )• number of cSmmands typ.ed). This variable moaL:urer} the . 

''pro^o^i^^ effo|i:; the mean in 52. 3^^ and the range is from 

'AJ^^ to 80.7^/' ^l^ n exa mining the'vcc|rrelation^vectc?^ for Ml'^, we .aee 

^that MI7 corjrelateo well^with th^J^^^ measures of ^grop^^^^-oti correct 

^ ' - . . 

(r > .6) and in the fexpected^-treetTon;'. As expected, jit eori^elate^/l 

• <^-f-. --i. I 

negatively wJLfh the numfcer of errors, although tlrie correlation with | the 



nximber 6f- semantic errors is not high, -.19 as compared to -.7I f'or 

0 

syntax errors. MI7 also correlates negatively with the syntax error 
rate (r ^ -.39), but the correlatioiy with the semantic error rate is 
positive (r = .^2), and there is a very low correlation with Jhe total 
error rate (r = .07). * . ^ 

MI8: Commands Typed .-f Length of Expected Correct Solution— (number 
of commands typed) -f- [(number of commands in the expected correct solw 
tion) X (number of students who attempted the problem) ] . This last 
measure of effort is akin^to an efficiency measure: it measures the 
amount of effort in terms of a standard, and presumably relatively ef- 
ficient, solution. The range of MI8 is from 1.09 "to 5»^5 with a mean of 
2.27; on the average, students 6,±d over twice as much as was needed to 
achieve a correct solution. In the ci^rrelation vector for MI8 we find 
only one sizable value: r^^ ^ .66. . Since Mil is also a ratio with 
thQ length of the expected solution in the denominator, this value is a 
refljection of the ccrrrelation between the total number of errors 

and the number of commandtJ typed. 'Jf Kome intereiU are the very low 
/ correlati9n3 with M7, M8, and M9, tne three meaL5ure^L3 ^jf err(j)r rates 
(fri < .05) /relative efficiency, meatiured by MI8, ^]eex(iz to have little 
statistical relation -to errt>r rate^* > , 



Nt^iber^of Students Who At temp tejS" Problem . ^^Jihic final measures of 
problem difficulty might ^havfe been.-classified as * another measure of^ 
j^tbvt for it simply meaoures thej proportion, of r^^tudents who made some 
effort to solve the problem. • 

I M19^: Students Who At-^mp^ed Problem— (numbe'S^f^ t^tudents who 



attempt^ed problem) -f . [ ( ni^er ; of students who attempted the prob^m) + 



(number of students who skipped the problem)]. Although there are hO 
students represented in the data, the denominator of MI9 is not always 
1+0 since some students did not progress far enough through the course to 
encounter all of the 25 problems. The values of M19 range from 1+8.3^ 
.95^ with a mean of 83.1+^. We do not know why some students skip certain 
problems, \whether it is because a* certain problem is perceived as Siffi- 

" cult, or perhaps as too easy and hence a w^ste of time. An examination 

",,of student protocols indicates that there was neither a small group bf*^ 
'students who consistently skipped problems, nor a particular probleih or 
set of problems singled ou^:. In fact, only two problems were skipped by 
more than one quarter of the stiidents. A^^tuay of the correlation vector 
,for MI9 does not shed imiuh Sofe^ l^ght on this question. - There ' a 1^ two 

^^rB^lMBB greater than .5. The value of r^^ is -.51, indicating some' 
statistical relationship with M12, the number of students who meTde syntax 
errors. The value of r^g is -.56, indicating a relationship with « 
relative efficiency. At the other end of the scale, we see very low 
correlationG with the :;yntax errcjr rate, ( r .06) arui wi^h themimber^ 
-^of students who made oemantic error^s (r - -.0],). 

Several facto emerge fr(jm tlie abf)ve di»cusbion of the I9 variables 

,1 related to problem difficulty. Frjremor.t li\ that many uiM^fifae mc'^CTireo 
are statistically unrelated to one another* ^tl^-^l of them -are measuring 

^ some e^spect of |5rable]n difficulty, then it is clear tliat the' measurements 
are along severaJLv quite different dimencions. There are, of course, 
Eftronif similaritj/es between certain pair;S of measures. Ml and M2, for 
example, which are both measurer, of' proportion correct, are closely 
related both conceptually and otatistically. For the most part, tl^pse 



pairs that one wo^^ld expect to be cTlpsely related do correlate high] 
and in the expeeted direction. There is a striking exception^ this 
in, th§' sejxrrmeasures based on semantic errors; theseja^sures; in . • 
' particular M8 and MIO, do not relate to measurejs' of proportion correct 
"or to measures. of syntax errors In the^way one would expect. In fact, 
if we had looked "only at M2 as a measure of pro'J)orfeit)n correct and MIO 
»as'a. measure, of thd errors, we might have been tempted to conclude -that ' . 
the' e.rror rate is^ highest 'foi' the 'easiest problems. We draw ho such » 
conclusions, .however, as wp have ;iot,^een able to forriiu late any intu- 
itively ^^satisfyihg hypo€hegis that fould account for this apparent 

* / ^ * 

anomaly in the data^. , / 

' ' '* y ^ ► * » » 

Having 19 measures 6f 'problem 'difficulty is an embarrassment of 

riches, and for mare" detailed^'stiidy wt- chose from among them a smaller, 

more man6g^able cubsot. A3 mentil^nea •bofrfrc-, wo consldor^Ml t® h^B the 

pripary mea^H^ of problem difficulty iDecaupeyit is .the rho^t similar;^i.o 

measures; of problem difficulty uoed hj/ohi^v re n-f.^ arc hers and /our results 

can thy'p be morc' r^aJily eikpan-d to ^ecullL: Zbtainod by^oth(->rs. For - 

- reasons already moRtior.ed, wc feed that M2, the propi.-Vt^on c^rr^ct dis- 
regaj-dirig errors in algr?brai€ fcrmulai,, ifi-a mure natis^^^&tory measure ^ , 
of programmirig aifflcur^y p'-rV,';;. Ml and V2 r.pr;tr.^W. mca<-,uro vr ry tiimilar 
aspects of problem dii'ficulty^^ so_ fur varit^^y we also chose four other 

\ measures that seemed to^ b^ quite unreiated ;'to Ml and M2 and ^to one another 
one 'Is a measux-e 'of j^yntax ef^rprs '(M?) , another admeasure of semantic ^ . 
eu^ors ,the* third, is a measure of efficiency or efforf (Ml8), and ^ 

tW' last is the number v6f students who ^attempted the problem (MI9). For 



ease of reference these selected measures are listed in Talale 13, which 
also shows the correlations between each pair. ' \ 

. O K it ' 

, f 

In pursuing the study of these six measures', we are interested in 
.'»,»' ' . p, , , 

discovering what characteris tip's* of the problems or of the curripuium 
influence problem diffibultj^^^'^^ how well we could have predicated 



p^^^m difficulty frpm an; a priori evalulation of these characteristics. 
The /tool we use^ in this'itudy was. step-wise multiple linear re^ssion . 
using ten dride|)endent variables to pr(|d4,ct the values of the six* selected 
.'measures .of prbblem difficulty. The' ten- vajfipbles .are deflijed indepen- 
dentlv of the data: soine of "C hem meajajixt char^al&teristics of the problems 
themselves (AEG, FCT and INPUT) ,^ soi^e me fi&ure; aspects ofv the cu^riculi^ 
context (LES, HELP, VOCAB',* and lbw), and^sonffe are/o}}tained 'frofl/^he^ex-[ 
i)ected correct solutions and are'henbe dependent upo;i botljf the problems 
and their context (IF^and LNG). Tfte ten variables ,are desc^ribed below, 
and their Wl ues for each problem -are » given in- Table ^ llf . . 

»IF.: The v^flable IF Ig the proportion 6f cnnditiorial pommands (i.e. 
^the commands that contain an IT' clauce). used in tljie expected correc't 
responses -llnted in Chapter 'II. .The values of IF v^ary from. 0^ to^^Sllo 
with a mean of 'l^^ and a ata;uard dc^vlation o^" 20^, as afiuwn In .Table ih. 

' ARG: ThiG varl^ab'le dopendB upon the mathumatical functiun required 
by the problem. The-values^ of ARC} are - - " 

0 if there is no argument for the function 

1 if there is one real argument • ' / 

2 if there are two real arguments ' ^ ' * 

3 if the argument is a stored list , . * ! 

> • - . « 

The mean of -ARG 1 



86 «ff 



J33 



Table 13 

^ix 'Selected Measures of Problerj Difficulty * 




Description bf Si.x Measures 



::'o 



Ml: Proportion Correct, 
•(number of students who gave a cotYect 
solution on th^lr first trial) -f^X number 
of students who attempted problem) 

M2: Proportion Correct up to Algebra, . 
(number of students whose solutions on 
•first "^rial were correct except for • 
possible errors in algebra^.c fgiroulas), 

(number of students who ^attempted 
proUlem) . ^ ^ 

f • , ■ — <? 

Jf7: Syntax "Er for Rate. * \ ' *v 

(number of syntax errors) -7 (nunfber of. 
commands typed )^ , , / ^ 

MI5: Niimber of St,udentB' who Ma^e 
^^emantlc Errors. ' 
(number Of students wlao m^qi^ semantic 
jgrrors) ^t^( number, of sVudents who . 
attempted^ tlie problem) \ 

MI8: Efflcife^cy; ' . * ' • 

(number of. command's typed) -f number of 
commands in the ejcpected porrect solution) 
X (number of students who attempted the 
problem) / * ' / . 

Mi9: Students who Attempted Problem, 
(number of ,8tudentp who attempted ^problem) 
^ [(number of students who attemfte^ - 
problem) +;( number of students who skipped 
problem) ] , ^ % 



Correlations 



Ml^' M2 . M7 - MI3 MI8 MI9 



i.oa .90 ,-.^^ -.37 



.27 .39 



1.00' -.1+8. -.2k -.31 .kk 



l.QO. .16 .01* • .06 



1.00 .20 



-.01 



i.ob -.56 



1.00 
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Values 



Table iH - ^ , / ' ^ 

of. Independent (Problem) Variables 



, Problem j 
Number 


• "a 
IF ' 


Ahg 




REa:T 


LNG 


INPT 


°LES 


HELP 


•■ft 

voc 


' . ' - L5i-30, 


.06 • 


1 


1 

1 


0 


2 


-.3 " 


5 


0 






.00 


1 


1 


0 


2 


3 \ 


8 


2. 


3 


' I:!.:'' l8*27 


.00 ' 


2 


. L 


, 0 


2 


2 


8 


0 


3 


.:l8-28 


.00 


1 • 


1 


0 


2 


5 


8 


0. 


3 


L9-3 


.00 


2 


1 


- 0 


2 


f 

U 


9 , 


' 0 


.3" 


L9-8 


.,00 


2 


1 


0 


2 . 




9 


0 


3 


LlO-12 


.00 . 




1 


0 


2 




10 


I 




1 Llo-19 

1 


> 00 


1 


1 


0 


2 


10 


10 


1 


t 

"5'. 


' • 1,11-11 


.00 ' 


1 


3 


0 


5 


5 


11 


0 


6 




.00 


' 2 " 


1 


.0 


5' 




12 


1 . 


7 


' / • L13-29 


.00 


1 


1 


0 


3" 




" 1.3 


0 


7.. 


^ ^, ' -115-4.5 


.ko 


2 


1 


0 


5 


0. 


, 15 


1 


8 


': * L15-J^7 


.36 


.2 


1 

1 

lo 


0 


10 , 


0 


,15 


'• 1 


'8 


/■ ^ 1:15-18 


.50 


2 


0 




"0 ' 


" 15 


•1 


8 


L15-21 


.30- 




r 


0 


10 




15 


0 


'9 


• L16-U 


.20 
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1 


0 


5 


0 


16/ 


2 


10 


L16-6 


.67 
.17 


\2 


1 


0 


12, 


9 


■ 16 


0 


CI 

10 


£23-7 


0 


a 


1 


6 ■ 


0 


23 


0 • 


'12 


L2U-11 ' 


.17 ^ 


1 


*i 


' 1 


6' 


0 


. 2h 






L25-8 . 


.00 


.1 


1' 


0 


2 


7 


25 


2 


12 


L26-5. , 


.60 


1 


2 


1 


7 ^ 


0 


26 


1 


• 12 


L29-19' 




2 


1 


0 


13 • 


1 


29/ 


' 0 


12 


',L3?-'5 


. .00' ' 


3 


2 


1 


; 6 


2 


32 


0 




L32-8 


.00 


3 
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1 




2 


■32 
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13 


^L32-19- 


.33 
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32 
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1 

q* 

0' 

y 

0' 

•1 

1- 

0 
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o 

0 

1' 
1 

0 
0 
•0 
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i3 
'0* 

1 ' 
0 ■ 

0 



,1U 1.60 1.16 '.2U 5.00 2.68 16.7 ."60- 7.96 



.20 .7j6 .U7 .UU' 3.29 2.87 8.6 .76 3.81 .51 



FCT: Most programs. writ3:en for; f'he ^5 problems this set d.efine 
onOy a single mathematical f^inction but a fpw define several functioris ^ 
■(of the same ;iumber of arguments) . The 'value of FCT -is the number of 

■ ■■ . ' ' / ■ . • -v." ■■ ■ ■ • ■ 

'mathematical functions defined, and ■m6y,;be''.l'^ or '3- 
J\ ) ' ' '' ' ' ■ (■-] 

REtT: This 'is a 0-1 "rti iteration"" va-rl'^ble' whoseT" value is 0_ if • no _^ 

loops or^subrolitines -are required and 1 otbeivlBei , - 
^ 'lNG- "This variable- is; the nuiiib..'f'of comm^iidf. in t^ie expected correct 

■ ' ^ • " 

• so3,ution*' The value?._ range frdjn 2 tp 13 with. a mean of 5- ' 

■ y * ' ** ' , ^ ' 

IfNPT: Tf.e number -of sets 6f input valued specified in the problein 
^ ■ , ' . .■ 

io given by JIfPT, vliutm value ranges from 0 to 10 'with a mean of 2a7. 
.'■'"''.* * ' ' • 

L19S- THIe v^iriable" the lesl-ort rfumber/ is a mc-aovre of the positien 

of the prob3,em in • the curriculum. J " ' 

HELP: ;-Thib variable' meaijures 'the amount of rtelp given in the 
problenur.tatempnt.' Prvblem otatement'o ocear.ionally^ include an example, 
of a clobely related pro§r4m, or part uf kuch a program, and HELP ■is a » , 
raea<iun> of- thl:. kin>3 ..f a-v.-,lr,tance. HELP - 0 it no model war, givrn,- ,^ 
HELP ■ 1 If a pfirtiai mold wa;; giv»T!, pnU im.P ■ cMl a complt'te model 
•wac glvc-a. "The Valur . f tffiLP 1:; nuri-zer.j f^r ll"/Mf 'he pmblf-ms, 
with a m'vixj .■'ol'v .b. ^ " . ' 

Vor: Trd.. V^iriublf mt' a./un-^i th*; ami uf:t oi AIL v« > ataluiy that- had 
been presented, by th^ ':^urriculurTi befr^ro thr pn.bleiii wa.- glvruu' The 
lexical it^ms that ar^- .counted an^ " . ^ 

V ' \ ^' 

SET ' - ' • • ^ 



LET • 

D(J GTEP .(uoed directly) 
FOR ' 



'. 



' DO PART (used directly) , , 

]3EMM1> 

* • . * ' ^« , >i » 

(, , - ^ 

IF - • " ' . ■ • 

• AND > . ' . ■ / 

^ TO . • • ■ • , " 

DO (used indirectly) - - . 

FORM • ■ * . ' , 

Indexed Variables , 

'NEW: This is a 0-1 variable tttat- depends upon whether the problem 

requires the use of ' a command or function that has not b^en used in a - 

preceding programming problem. For this definition "pTQg-ramming prolDlems" 

are taken to be any exerciSres that require the use of the AID interpreter 

other than those problems that require only that the student; copy 'ver- , 

batim AID commands printed by the teaching program. The "nev'^^cprnmari^sr . 

tod functions considered here are not restricted to the listfeiven above \ 

fot VOC. Approximately ^half of the problems do require the u^e of a npw 

word, and hence have a value of 1 for NEW'^ • , . ^ 



The- ten variables described above were us.ed independei|t variables 
in step-wise multiple linear ^ regressions in to attempt to disc^over which 
were effective in accounting for problem difficulty. For this purpose , 
it is best to use variables that are statistically independent of one 
another. This goal is difficult to achieve, and was approached with 
only moderate success by this set of variables, as can be seen ^ from the ^ 
correlation inatrix given in Table 15 . There are five pairs of inde- 
pendent variables for which the correlation is greater than .-5. The 
first of these is IF-tNG. In a preliminary study, the^ variable IF 4as 
defined to be the number' of conditional commands used in the expected 
correct solution, leather than the proportion of conditional commands. 
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When' it was jCound that this/ variable correlated highly with length 

■ /• ■• ■ . - 

(r = ^85), an attempt was^made to reduce the correlation by dividing by 

length. Some reduction iias Accomplished but it was not as great as was 

hoped for (from .85 to^/.73) . A study of the values of IF and LNG (see^ 

Table ll^) reveals tha-p for 9 Of the 25 problems the expected Correct 

solution contains on/ly two commands, and that, for these problems the 

value of IF is zero/: this fact aXone explains the high vajA of r and is 

sufficient reason /for considering separate analyses of programs with and 

without conditional commands. This was not done here' because the size of 

tde sample is t^o small to warrant subdivision* 

The secon^'pair of highly correlated variables Is REIT-LES^ for 

which r .= .77/ Whether or not theire are loops or subroutines (REIT) is 

highly dependent upon lesson number (UES), For the first 17 problems, 

included in/Lessons 1 to I6, the value of'^RE^T is zero. The value of 

REIT is 1 for six of the remaining eight problems.^ No reasonable wny of 

transform/ng either REIT or LES to redcTee the correlation was apparent,* 

VOc/ls also highly correlated with LES (r - .9I+) and with REIT /. 

/ 

(r = .68). It is to be expected that the amount of vocabulary introduced 
will b^ dependent upon lesson number and we would expect VOC and LE^^to 
accou|(t for approximately the same variance in problem difficult^' 

^There is one remaining pair of variables with a high correlatMn: - 
for/LNG-VOC the value. of r is -5^. In Chapter III we observed' a similar 
phenomenon*, that the total number of commands in the daxa tended to in- 

c^^ase with problem number but that the increase depei)ked mor0 upon an 

''1 -, 

Increase in the Mxiety of commands than upon an increase in the occur- 

/■ ' ' - ' 

I pence of a given kiplk of command. That comment referred to the data, 

( ! - . ■ ' •, 

' ' , . ,,■ 92 



whereas UMG is a function of the expected correct response; the same 

observation seems to be true for both, however. 

As. a preliminary" view of the relationships between the ten inde- 

pendent variables and th^ six selected measures of problem difficulty, . 

a correlation matrix is shown in Table l6. The correlation vectors for 

Ml and M2'. the two measures of proportion correct,^ are strikingly similar. 
\ 

Both Ml. and M2,3re correlated highly (negatively) with LES; there are 
^Iso substantial correlations with IF, AEG", LNG, and^VOC; and the cor- 
relations witjh FCT and.IIilPT are quiile low. ^he pattern §Z the correlation 
vector for MI9 shows some resemblance to the vectors Ml and M2 although 
/ the simifarit'l^js are not as great as between Ml and M2, There are few 
similaritiei^-Wween the other correJLatiod vectors, anid the difference^ 
between M7, the syntax error rate, and MI3, the^ number of students who 
made semant'ic errors, is marked. REIT, for example, correlates quite 
well with MI3 (r - .5) but not at all with M7 (r = .025). Also, the 
coefficient for TF-M^^ is positive whereas it ib negative for TF-M13- 
In general, the correlations with M2 are high, followed closely by Ml, 
and the .correlations for M7 a:t€*^BV. 

^ Using ByiD02R we ran Liix step-wise regressions, one for each of the 
"^six selected measures of problem difficulty, and derived linear equa- 
tions for the prediction of each of those measures- These equations ^ 
'(with coefficients rounded) are given in Table 1?. For ease of reading 
we have transformed each equation to yield percentages rather than 
fractions. Our primary purpose in using step-wise regressions was not 
to produce these linear modelfe, however, but to determine which of the 
independent variables had the greatest influence and to find out how 
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Table l6 ° 

Correlations Between Independent Variables and 
Six Measures of Problem Difficulty 

V9 



Measure of Problem Difficulty 



Independent 
Variable 


Ml , 


M2 




MI3 


^a8 ' 


MI9 . 


° IF 


-0.491* 


-0.579 


0.277 


■ 

-0. 308 


-o« 151 


-0.350 


-0.1*39 


-0.506 


0.265 • 


0.225 


0.572 


-0.299 




-0.031* 


-0.129 


-0.011* 


0.295 ^ 


-0.055 


0. 125 


/' 

REIT 


-0.387 


-p. 1*1*3 


-0.025 


0.1*96 


0.531 


-0.576 


. liflG 


-0.1*91 


-0.603 


0.256 


-0.022 


-0.321* 


-0.133 


• INPT 


0.198 


0.238 


-0.215 


-0.21*5 


0.002 


0.21*8 


LBS 


, -Ot5'29 


-0.61*8 


0.21*5 


0.382 


0.1*61* 


-0.599 


HELP 


0.1*75 


0.391+ 


-O.O6I* 


■ 0.026 


-0.070 


0.166. 


VOC , 


-0.1*36 


-0.585 


0.155 


0.303 

r 


0.270 


-0.555 


NEW 


0.252 


0.358 


-0.162 


0.01*5 


-0.085 


0.395 
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much of tlie variance in problem difficulty could be accounted for by 
linear combinations of the ten independent variables. 

Table, 18 is a summary table of the dynamics of- the regression, 
showing for each measure of problem difficulty the order in which, the 
independent variables entered into the regression and" the amount of 
variance accounted for al each step. The total amount of variance ac- 
counted for is shown at the bottom of each column, and we will start 
with those totals. The first fact of interest is that over 80^ of the 

variance of M2 and MI8 are accounted for; that is, the models serve 

• ' " ♦ 

f . 

quite well for predicting the proportion correct up to algebra and the 
amount of effort made by students relative to a^ fixed 0et of correct 
solutions. The models for Ml and MI9 are also reasonably good; of 
the variance in proportion correct can be accounted for, and 67^ of the 
variance in the number of students who attempt the problem. The models 
for M7, syntax error rate, and MI3, the number of students who made 
semantic errors*, arr^ lonr, oatlnfactoi^, with lens than 50^ of the vari- 
ance accounted for. 

In considering these figures it should be kept in mind that we are 
using ten independent variablot; to account for the variance in 25 prob- 
lems, and thus would oxpect to account for a sub;itantial portion of the 
» 

variance even if our independent variables were poorly chosen. For a 
comparison, iet us see what the results would be if * we selected only 
five of the ten independent variables ( the best five in each case). The 
amount of variance accounted for by t^e first five variables to enter 
the regression iG alsc; j^huwn at the bottom of Table I8. For all but 
one regression, tKe first five variabJ/^-account for 95^ to 98^ of the 

96 ...-l;- IC// _ 
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variaoce accounted for by the full set of ten variables; in other words/ 
our prediction would be essentially a^ good if we used only, one-half of 
th| independent variables. The noticeable exception to this rule Is the 
model for the prediction of M7, the -syntax error rate. For this measure 
of problem difficulty the predictitjir with ten variables is n8t good (only- 
U2fjo) and the prediction based on the best five variables is quite unac- 
ceptable (19^). In short, our derived linear model for the prediction 
of ^f7 ig worthless for practical purposes. 

For the cases in which we can account for a reasonable amount of " 
the variance, it is instructive to look more closely at the order in 
which the inde^pendent variables enter into the regre^feion. For three* 
regressions, LES is the first variable, from which 'we can conclude that 
the position in the curriculum/is an extremely influerttial factor in 
problem difficulty. On the average LES alone accounts for more variance 
than any other single variable. The second most influential variable 
seems to be IF, which if. am^.^ng the firt,t flv^^ variable r. -.to enter this 
regression in all casea* mT is among the firt;t five variables in four 
out of the six cboGg, and would have appeared mo^^ irifluential if LES, 
with which it is highly correlated, had been removed from the list. 
Another variable of nome Importance is HELP which entered oecond in two 
cases and fifth in one case. In summary, we conclude that In predicting 
problem difficulty the variables with greatest influence are the position 
in the curriculum, the predicted proportion of conditional commands, 
whether or not loops or subroutines are required, and whether or not the 
curriculum offers an example for the i^tudent to model/his tsolution on. 



The fi.rst and last of these ntight be characterized as cu-rrictjlum-dependent, 

whereas the other two are more related to the problem itself than to the 

context in which i*^ is found. • . 

These conclusions are subject to some interpretation, however, For 

example, in the regressions Ml or M2 (proportions correct), REIT entered 

in only the eighth step, and we might conclude that whether or not loops^ 

or subroutines are required has little effect on proportion correct. This 

f " ^ .... 

conclusion cannot be drawn with impunity, however,* because of the high 

correlations between REIT and both^LES and VOC. Since both LES and VOG 

entered into the regressions earlier, they tpok out a great deal of the 

variance that would otherwise be attributed to REIT. 

Another fact worth commenting on is that although LES, HELP, and IF 
entered as the first three variables in the regressions for Ml and M2, 
they did not enter until the fifth, sixth, and seventh steps in "tlie re- 
gression for Mi8, the relative efficiency of students* work. 

Before turning our attention to other anpccto of r>tudentn' perfor- 
mance on programming problems, we would like to make f ew comments on 
the analysis described in thic chapter. Flrat, alth. u/^h wc dc- fined and ^ 
explored in ix^me depth a large number of measurer, of different aspects 
of problem difficulty, there 1.; another nizable r.ei of vBri^able.-. that 
might be even more precise measures of problem difficulty, and those are 
measures based on the time required by students to produce solutions. 
We did not consider time-dependent measures here because the instructional 
system did not record elapsed time in any precise way. The reader who is 
interested in analyses of problem splving behavior using time-dependent 
measures of problem difficulty is referred to Er. James Maloney's paper 



"An Investigation of College Studen^ P^formance on a. Logic Curriculum 
in a Computer- as sifted Instruction Setting. The methods, of analysis 
used by Dr. Mal^)ney ^re similar to those used in this chapter, and, in 
,fact,' provided a model from which this author (^rev ideas about both 
method and definitions of independent Variables.- 

One independent variable of- possible importance was inadvertently' 
omitted, a -measure of the amount of gtiidance available to the students 
in the optional hints. This' variable is akin to the P^LP variable used 
to measure the (non-optional) guidance ^iven in. the problem statement. . 
In view of the fact that HELP was quite effective in predicting Ml and 
M2, it seems reasonable that a HINT variable might ^Iso have been worth 
considering. 
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.f. ; . CHAPTER VII 1 ■ 

• f Classification of Correct and Nearly Correct Bolutiol^s 



' 'In the preceding chapters we discussed the number and distribution 
of correct solutions. J.^ this ch^jfter and the next we study the kinds 
of correct solutions. Correct, and nearly correct, solutions are clas- 
sified by type using four different methodG of classification, two based 
on the forms of programs and two, baE,ed on the functions- As noted in 
' Chapter IV, U27 of the 7U7 first att-empts made by students were correict. 
In addition, there were 12U solutions nearly ^otlgh correct that they - 
could be unambiguously classifi-^d according to each of the four classi- 
• fication schemes. These nearly correct solutions are included in ^i:he ' 
analyses described here; giving a total of^ ^51 btude^- writ ten programs y 
an averag^'of 22 per problem. 

In this chapter v/e use "solution," or more lopnely "program," to 
refer to, both^thP stored program and the direct cowmandr, used to execute 
it. For. the' first lew problem;:, through Ml-11, n . . lution ii; not con- 
sidered correct unle-r; the ntudeint f.-xecuted tiic program u:-.ing the input 
values npecifled in th-.- problt-m ota+ement. (He could, of course, use 
additional valupt; al.-,o.) ■ Aftor Lll-11 a ei>rroct rulution mnf't include 
'the commands needed to pxecutf the program but the,.,«i<tual valucp used 
are immaterial. Because, of this dls^contipiidr^ in the grading. Kcherae 
our definitions of program '(?quiva;i^nce will contain r.pecial clauses for 
jithe treatment of! solutions written after Problfm Lll^ll. 

For the flist two definitions of program fqUl valence we are con- 
cerned with the forms of programs ,. and will define equivalnnco in terms 
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o:g substitution of equivalent.. commands or setjuences of copnands. These 
first tvo kinds of equivalence, which we will call formal iidentity and 
-formal equivalence , are defined in terms of ' allowable substitutions as j. 
follows: two programs are said to be foimally identical (foimally equiv- 
alent) if on^ of them can be transformed into the other by any finite 
sequence of substitutions, with possible repetition, from the list of * 
substitutions allowable for foimal identity ( formal equivalence).' 

Since the allowable substitutions for formal equivialence include 
the substitutions allowed for- foimal identity, it follows that any two 
forma«lly identical programs are also formally equivalent,, although the 
conve^^ Is not necessarily 'true. Furtheimore, all of the allowable . 
substitutions preserve semantics, so any two**that are either, formally 
identical or formally equivalent will also ^be functionally equivalent 
(wliich will be our fourth definition of progl^am equivalence). . _ 

The formal substitution rules are described below. Rules 1 to 7- 
define formal identity, and-^Rules 1 to 15 define foimal equivalence. . 

,Rule 1. - If two. commands are identical except for optional spaces, 
one may be substituted for the other. For example, spaces may::be freely 
used in simple algebraic expressions, so" these two commands are equiv- 
alent under Rule 1: 

TYPE X + Y - ■ ' ' - . ' 

TYPE ^X+Y-Z ^ ^ . . • 

Rule 2 . If two ireerai numerals are equal when rounded to three 
decimal places, one may be substituted' for the other. The following .are 
equivalent under Rule 2: 
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0.3937 
.3937 
i39lt 



\ 



/ . 

/ 

2.1 SET X = - Ij. . 



Rule 3. If two programs differ 6nly in literal step number^,, one ^ 
may be substituted for the other. The step numbers must be in the same 
:numerical sequence vithin partg/ and references to the steps (as in DO 
or TO commands V are substituted for concurrently. As an example,, the 
-following tvo prdgranjs are equivalent under Rule 3: 

\.\ DO PART 2 IF }y< 1 3.15 DO PART 1 IF X. ;< 1 

1.2 TYPE X / . : ■• 3.17 TYPE X 

1.3 SET X = - X 
DO PART 1 " , DO PART 3 

IJotice that although the sequence of steps within a p^rt must remain in 
numerical order, part numbers need not. . ' 

Rule \\ If tvo LET commands differ pnl;^ in the letter used as - a 
dummy variable, one may be subs tituted'--''Tor , the oj^her. Rule i^- applies 
only to variab!^s bouad within a LET coijimand whereas Rule 5 applies to 
other variables ae^well. ^ ■ . ' 

Rule 5. . If two^ programs differ only in the letters used for 
variables^ one may be substituted for the other. The variables referred 
to h^^ ripay be ^.real variables, names for functions, or names' for lists 
of numbers . 

. The next ^ two rules are,-?^licable only for problems aftet Lll-11. 
Rule 6 . ^ In a.3a:ifect command of the form 
DO PART n FOR x = m^ 
m 'may be replaced by m^ where m and m are any numbers, list of numbers,^ 
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or range specification. In the/above n is a part number and/x- is any 
..variable. I . 

Rule 7 , In af direct command of the form ■ 
SET X = n- 

n^ may be replaced hj- n^ rWhere n^ and n^ are any ^nlimb^rs. 

These first seven rules define forma^ identity./ Although the above 
definitions are not stated with complete precision J they can be re for-- 
mulated precisely to define a decision procedure for formal identity,, 
the only one of our four equivalence relations that does admit such a 
procedure. When the set of solutions .for each problem was.\ classified 
■using the above, rules, the most numerous class was labeled A, the 
second most numerous I^, etc. These designations are used in i^ppendix A 
which contains a complete list of the types of solutions for each of the 
25 problems^ For some problems a great reduction in the number of types 
is attained by this method of classification. For example, for Pfoblem 
L8-9, there were 36 correct solution^ but only four distinct types- ^fiev 
reduction by formal identity. For other problems, the differences 
between students' programs are less trivial and consequently the reduc- 
tion: by formal identity is less effective. For instance, no two 
solutions for LI6-6 are formally identical, so no reduction in th^ 
number of types is achieved. .The number of types, or equivalence 
classes, under formal identity varies from 3 to I3 as shown in Table I9. 
There are an average of 8 equivalence classes per problem , with an average 
of 2.7 solutions per equivalence class. As we will see, formal identity 
is the weakest of the four Methods of classification used. 
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. Table 19 ^ ' 

:Kui?iber of Programs in Each Equivalence Class, Using 
•'L Four Definitions of Program Equivalence 



Problenii^i 
Number 



''Wunl'ber of 
^^'rograms 



Number of Equivalence Classes when Partitioned by... 



Formal 
Identity- 



Formal 
Equivalence 



Al^cfrithmic 
Equivalejnce 



Functional 
E<JUt valence 



L5-30 
L8-9 
L8-27 
L8-28 

L9-3 
L9-8 
LlO-12 

Llo-19 

Lll-11 

L12-U 

Lia-29 

LI5-I5 

LI5-I7 

LI5-I8 

LI5-2I 

Ll6-k 

LI6-6 

L23-7 

L2U-11 

L25-8 

L26-5 

L29-19 

L32-5 
L32-8 

L32-19 

Totals 



i 



33 
36 

29 
26 

23 
26 ■ 

27 
30. 
26 

3U 
25 
20 

23 
18 

25 
29 
8 

13 
18 
23 
22 
6 

9 
16 
6 



7 

12 
12 

9 
6 

10 




^•3) 
3 
8 
Ik 
7 

10 
6 
-12 

8 

11 
6 
it 

12 

5 
8 

9 
5 



5 

i' 

5 

2 

3 

k 

2 

, 1 
5 
3 
7 

11 

6 
6 
3 
7 
8 

11 
6 
1 
8 
5 
7 
8 
k 



551 



20U 



129 
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,1 


1. 


1 


k- 


1 


2 


1 


3 


1 


It 


2 


X 


1 


'1 


1 


6 


1 


2 


1 


5 


2 


2 


2 


3 


1 


3 


• .2 


1 


1 


1* 


2 


5 . 


2 


7 


3 


3 


1 


1 


1 


6 


2 


U 


1 


8 


5 


6 


3 




3 


91 


it2 
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Fo^iroal equivalence, the second equivalence relation, is also /define 
in tenns.'iof substitution rules. In addition to the above sev^n v^^ydlep , 
Rules 8 to 15 are ii§ed to determine fomal equivalence. 

Rule Q . The/ phrases y / / 

FOR x/= a(b)c 

and c/ ' • , ^ . 

FOR X = a,a+b,a+2b,...,c / 
may be interchanged provided the allowable length for AID commands is 
not exceeded. In the above, x refers to any variable; a, b, and c are 
i.real numbers; and a+b,^a+2b, etc., are real numbers whose values are a+b, 
I etc. Under this rule the following phrases are equivalent: 

FOR A = k{2)9 • . .0; ^ 

\ FOR A = h,6,S,9 \ ' 

The next five rules provide for substitutions of single commands 
jjor sets of commands or for pemutations in the sequence of commands 
plrovided such substitutions do' not change the function of the program. 
Tci avoid semantic changes, we require that the commands to be substi- • 
tuted for be contained between "critical points" in the program. A 
critical point is either the beginning or end of a part or a step to 
which branching may occur. Thus, if Rule 11, for example, would ordin- 
arily allow us to interchange Steps 7.3 and 7.H, this would be allowed 
only if there is no branch command (TO or DO) elsewhere that refers to 
Step 7.H. Thic restriction applies to Rules 9 to 13 • 
Rule 9. The sequence of commands 
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TYPE 



a. 



TYPE-e^ 



TYPE e 



may be interchanged with the single command 
• ^ CTYPE ei,e2,...,e^ • 

where e^^s are algebraic expressions, provided that tl:\e single TYPE 

command does jiot exceed the allowable length'^Toi^AID^^ora^ 

Rule 10. The^equence of n commands 

DO e 
DO e 




DO e 

m^y^ be ^interchanged wilth the Q^!ri>0.e; command 

i \ ■ 

DO e", n TIMM^ 
where e is' the specification of a part or s^tep- 
Rule li t The two commands 
SET X =^ e 

DEMAND y / 
may be interchanged if the expreooion e conlains no t^ccurrences of 
variable y and if tho vartables x and y are nut Ldcjntical. Thuo, wd 
can interchange 

SET A = 2^B 

DEMAND C 



or 



SET A 2^A 



REMAND C 
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but not 



or 



SETuA - 2#C 
DEMAND C 



SET M = 2-)fA 
DEMAND M 
Rule 12. The two commands 



SET X = e'^ . , 

TYPE eg 

may be interchanged if the expression e^ contains no occurrences of the 

variable x. ^ 

Rule 13 « The two commands *' . 1 

SET X = e. 

1 

SET y = ' . 

may be interchanged if x and y are distinct variablep, and e^ contains 
no occurrence of x, and e. eontainar no occurrence of y. Either or both 
*bf the SET commands may have appended 'IF clauses, pruvided> X and y do ^ 
not Qcqjur in the Boolean exprensipns used in the" I? clauses. 

By a** suitable reformulation of Rules 13, a decision procedure 

could b^ written for an equivalence relation determined by them; The 
next two rules for formal equivalerice do not admit of a decision pro- 

i 

-cedure, however, since both are bafsed on algebraic equivalence. 



Rule Ih . If a command contains an algebraic expression e^, then 
any algebraically equivalent expression may be substituted for it. 
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Rule 15. If a command contains a Booleian expression e^, then any 
logically equivalent expression e^ may be s^ubstituted for it. As an- 
example, the tvo commands * \ ' 

TYPE.X IF X < y + 7 

and 

TYPE X IF X - 7 < Y * 
are equivalent under Rule I5. Notice, however, that the commands 

TYPE X IF X < y + 7 , 
and , . / 

TYPE X IF X < 7 + Y 
are equivalent under either Rule lh or Rule 15 . Thus, Rules \h and 15, 
unlike other pairs of rules, are not independent. 

The 15 rules above constitute the complete definition of formal 
equivalence. As mentioned, only the last two rules prevent the formu- 
lation of a decision procedure, for formal equivalence. It is clear from 
an ini^pection of the solutions listed in Appendix A that a few simple 
rules for algebraic i^ubutitution would tserve to delitie a decision pro- 
cedure for the algebraic expreasionr. found in the data^ so a partial 

y 

solution tu this probJcm could be attained if it were deairable to 
implement a routine lor dt-tf mining Tormal equivalence-. 

Under formal equivalence a greater reduction in types is made than 
under formal identity, "as can be seen from Table I9. The 551 solutions 
reduce to 129 typen, an average of five equivalence claBuea per problem 

compared to the eight equivalence classes per problem for formal 
identity. Under formal equivalence ttere io an average of I4.3 solutions 
per equivalence claoti arj compared to 2.7 under formal identity. 
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■ We remark again that formal identity is indlilded in formal equiv- 
alence so that any two programs that are .formally identical are also 
formally equivalent^ and any^ two programs that are not fonnally equiv- / 
alent are also not formally identical. The next two equivalence relations 
to be discussed, algorithmic equivalence and functional equivalence , also 
include formal identity. Functional equivalence also includes formal 
equivalence but it is not the case that, algorithmic equivalence includes 
formal equivalence. Thus, it is possi-^^le to have two programs that are 
algorithmically equivalent but not formally equivalent and vice versa. 
What we have then is not a strict hierarchy in equivalence relations but 
a partial ordering. Denoting^ formal identity by I, formal equivalence 
by E, algorithmic equivalence by A, and functional equivs|lence by F, 
t)?fis can be expressed symbolically as follovs : • 

I.e E c F ^ . . ** 

I c A c F , ' " ^ 

not E c A ' . • 

nt 1 A R 

For the third of the four equivalence relations, two pri)gram'i are 
considered equivalent if they urse the tiame algorithm negardlet;:: of formal 
characteristic:-: uf the prugrami^ themuelveo. Thuy, algorithmic equiv- 
alence is concerned with tht; dynamics of the programs, whereau formal 
identity and formal equivalence were concerned with static quall^es. 
For our purposes the algorithm used in a program is determined by the 
values taken on by real variables, the output, and the sequence' in whictf 
these occur. The names used for the variables are immaterial and we 
will be concerned only with those- variables that take on real numbers as 
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values. Hence, we will be Interested in. tthe values of indexed variTikbles 
such as. X(3) and A(l,j:), but not in user-defined functions or in forms. 
As for output, we will generally be concerned only with computed numeric 
values' and not with the content of whatever text is also output; thus, 
we will consider any two text strings to be equivalent except for the 
few problems (e.g., L15-21) for which the only expected output is text, 
and in thpi^e cases we will consider two text strings to be identical if 
'their content has the same (English) -laeaning. 

To- clarify this notion we will represent the stored data at any" 
point in tiriie as an n-tuple of the values of the n ^variables to wlxLch 
values have been assigned. . The order of the numbers in an n-tuple is 
'dependent upon the order in which the variables were first given values; 
thus, Ihe first number In the n-tuple is the current value of the first 
variable to which any value was assigned, etc. As an example, consider 
the following simple program: 

Example 1 . 1.1 SET I =1 
.'' ' ' 1.2 SET X = 1*2 ^ 

1.3 TYPE X 

l.U SET I ■= 1+1 

1.5 TO STEP 1.2 IF I < 3 
The first variable to be used by this program Is I, so the value of I 
will always appear as the first number in the n-tuple representing the 
stored data. These n-tuples, in the order in which they occur, are 

■(1,2) ■ ' 
(2,2) . . 

Ill 
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■(2,U) ■/ . 

(3,U)" . ' • • - ■ 

We will also be interested in the output, and how it fits into the above 

^ - ■ . ■ 

sequence, and will .represent the sequence , of sto^red data and output as 
follows : ' 

\ (1) \/ . ' . 

(1,2), ' 
Output: 2 

(5,2) - 
(2,U) ' 

Output : U ^ 
(3,U) 
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The above sequence represents what we will call the algorithm for the 
program in Example 1. Example 2 is another'^program Which differs from 
Example 1 only in the IF clause used in the fifth^tep. 
Example \ 2.\ SET I - 1 

ITPE X 

SET I T+1 

TO flTB]P LM? IF I ^ 
If we write the algoritlim for Example 2, we find it to be identical with 
that for Example 1. Hence, the two programs are algorithmically equiv- 
alent. Notice, however, that these two programs are not formally 
equivalent since the expressions* I < 3 and I 2 are not logically 
equivalent. Our third exairiple inJ a program that ±c formally equivalent 
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to Example 1 but not algorithmically equivalent. (As we will see later, 

y ^ ' - 

1 three programs are functionally equivalent.; 

. E xample 3 . 3.1 SET I = 1/ 

■ . / 

• 3.2 SET/X =1*2 ' , 

^ 3.3 SSlT^ = I+l ■ / 

3.1/ TYPE X " • 

3.5 TO STEP 3.2 IF I <-3 
This program is simply Example 1 with the third \nd fourth steps inter- 
changed. Ey Rule 12 for formal e.gui valence, we find Examples 1 and 3 
to be formally equivalent. However, the algorithm for Example 3 is , 
(1) 

(1,2) 
(2,2) ^ 

Ou-tput: 2 . ' 

(2,M 

(3,1*) . 
Output: U 

Vtdch is not" identical to the algorithm for Example 1. 

The examples above are too simple to fully illustrate the concept 
of algorithmic equivalence- 'since they do not use input data. Following 
is a simple example of a program that uses a single numeric input. 
Example U.l SET Y = X TS, 1. >= 0 

l|.2 SET Y = -X IF X < 0 
l|.3 'TYPE Y 

For this program the sequence of stored data and putput depends upon 
the value preassigned to X, the input variable. ' For example, if X is 
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-5 the sequence is ■ 

(-5) V . .. ^ 
• • (-5, 5) 

Otitpv{t: 5 /, / 

For each imlue of -X there is a different , sequence and it is the entire 
set of sujth sequences that determines the algorithm. 

Algorithmic equivalence, as defined above, is slightly more powerful 
for the\.Bet of programs under ^onsidetration than formal equivalence*^ 
There are an average of* 3*6 equivalence classes per problem^ as compared 
to the five equivalence classes for fprmal equivalence and -the .Mght 
-classes £or, formal- identity. There is an average of 6.1 programsv^er 
class, t?hereas formal equivalence yields U.3 programs per, class and^ 
formal identity 2.7 programs pef class. The classes under algorithmic . 
equivalence are labeled A^, A^, etc., in Apgeadix A, Vnich lists the 
classification of every program in the data. 

The fourth, and lafit, equivalence relation utjed. ln cla0tilfylr|'g, ' 
student-written prugramn i\inctiunal equivalence . Tn detennlnlrig th& 
function of a program we con-:ider only the output 'arid nvjt. the fonn'^of 
the program or the valuer of any' variafclct; other than input e^d output } 
variableL5. A.: with algoriUmiic equivalence, fext in v/hich nmrnrtc: 
results are imbedded is ignored; only fur those few p/6gramt, whose (out- 
put is non-n\imeric do we consider the text that is/printed. Also, as 

■ y 

for algorithmic equivalence, niimeric ret^ults are? rounded Lo three 
significant digits. Functional equivalence the simplest and the most 
powei^ful of the four methods of claGnifica;l:!l!im^ There flm an averag^^f 
1.7 equivalence classes per problem, with,a\i avetege of 13.1 pro^ains 




in each class. Aox 11+ of the 25 problems all of the student-written 
'"'programs werl' functionally equivalent. I» 

T-ro/the wide variety of possi'ble methods of classification for 
> prog2>^s, we have chosen four to study in some depth. In choosing the.se 
5ur methods, we have been guidfed b^ the f&llcOing considerations. - " 
(1) We wanted to use equivalence relations that conformed to intuitive • 
notions of p/ogram equivalence. (2) The equivalence relations should 
show considerable, sppad in their' "grouping power" over the set of data; ■ 
that is, the' weakest! of the relations B.hould provide 'oftly,, a minor reduc- 
■ tion in the number 'of types, whereas the strongest should come close to 
'grouping ail programs into a single ,9la6s. (3) The equivalence relations 
. should be mathematically defensible, in that the concept of equivalence 
is well-de fined ••independent of the d9t^. In relation to '(-3) we, would' 
' al^o have preferred -to exhibit 'equivalence relations for which a decision 
procedure could be defined. Except for I, formal identity, our defini-^ • 
tions do not hatisty. tihio requirement,. and we r.aw no'way of natisfying 
tkp without tje'rioutay violating either U) uy ('.■). ^ By -Iffining formal . 
• ,, equivalence more otrictly, In particular by s-.uilaldy n'i'trict,lng ftTb- 
'stitution of algebraic oxprecnionn, wc cculd have- provided a de/lnition-.' 

. V " » ' 

/ 

that would admit of a decif-icn pruQedurf. Fur fututf; ;;1u.llcii we 

/, I 

. ^recommend that this approach 'be /xplored in more' depth. (>ur recomraenda- 
. tion for this is based on the. feeling that formal equivalence most nearly 
stppfoaches the intuitive notion of equivalence fxprefujed'by ctydents in 
phrar.eo nuch^aB "these tyo^rogramn are really the r.ame" or "thene two 
program!-- may do the ;;ame thing but thny do it quitf diffen-ntly. " A 
programming consultant, automated or human, who in trying to help -a 
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. student complete a partially written brogram or debug a faulty program,, 

wolild'beSfiest likely to be effective if be (or it) can guide the student 

towards a formally equivalent correct s\^lution. An automated consultant 

.could .do this only If it were- capable of\ determinjtng to which formal 

equivalence class the student's partial s<i?lution belonged. In other 

words, the consulting 'routine would need' a dqttii'^ion procedure.- for a ^ 

i ^ . . . 

formal equivalence relation that extended to incomplete and incorrect 

\ solutions as well as correct solutions. ■ . 



- - • • ♦ .... . y , , t. . ^ 

/Although three of our four equivalence relations do not admit, 6r 
clsioa prpcedures, requirements (2) and (S), were well; satisfied. 



Whether or not requirement (l)— that the equivalence- relations • are in- 
tuitively valid--is- satisfied is left to the reader to decide. "In 
connection with this we mention several other possible means of class!- • 
fication that 'could have been used. Formal identity and formal equivalence, 
for instance, are bnt tvo of a very large number. of equivalence relations \ 
based on substitution or semantically equivalent parts of programs. 'Any 
one of the substitution rules defined -above, of any arbitrary set of 
those r^ales, would define an equivalence /relation^ . There' are. also a 
large number of applicable substitution rules that we did not -list. Gne, 
for example, would ^ allow the permutation -of two adjacent DEMAND commands. 
Another would all.ow the substitution of 
SET X = e . 

TYPE X . - . 

for ^ 

TYPE e " / 

where x is a vari-dble that" does not occur in e or , elsewhere in the 
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program.. One could also devise more complicated inales'of substitution, 
such as j^he substitution of iterated Subroutines for certain kinds of 
loops, we did not use the last-mentioned of these possibilities because 

felt that such a substitution would allow us to equate p;rograms thiat 
students feel to be quite different. As for the other two possibilities 
above, we did not use them, (and many similar rules) beesqase there was no 
instance in the- data where they could "be 'applied; all of the 15 substitu-^ 
tion rules listed for formal equivalence were actually used in classifying 
the data. 

Besides the many kinds of formal equivalence relations that could 
be used, there are. a number;.of possible variants on algorithmic and 
functional equivalence. We might, for example, have differentiated 
programs 6n the basis of the output text;, in studying programs written 
in languages with string raanipulatioi;i features, such^ distinctions would 
be of more importance. In defining algorithmic equivalence, we considered 
the sequence of values for all variables used by the program; for' more 

; complex programs than those found i;n .the data analjf'zed here, it might 
be wise to exclude vaji?lables bound in subroutines or even variables 
bound in simple loops. For block structured languages only global vari- 
ables might be considered. As for functional equivalence, a more powerful 
equivalence relation could be defined by considering fionctions to be 
equivalent if they differed by at most a fixed number of values. 

In summary, out of the wide variety of ^possible, well-defined equiv- 
alence relations, we chose four, that were sufficiently different to 

. illustrate the spectrum of possibilities, guided in our choice to a large 
extent by intuitive appeal. In the next chapter we^ will analyze the 
effects of e'ach of these* equivalence relations on the given set of data. 

n . ■ ' 



•. ^ - . CHAPTER VIII ■ • , 

Diveopsity of Solutions - . . 

— . ■ - ' 

In looking at programs written by' students (Appendix B) one -is struck 

by the fact that for some problems mbpt ^ students prqduced-. |rery similar 
looking programs, whereas for others there seem to be ^ew points of 
"similarity. In this chapter we devote ourselves to the study of the 
diversity of programs writte^ Ijy students and attempt to- e:)q)lain why 
there is more diversity for some problems tt^ i^ for others. To do this 
we will first introduce a suitable measure of diversity and then irives- 
tiM^e^he statistical relati^ship between diversity and various 
r]/easurable qualities of the problems and the curriculvun. 

The. amount of diversity ob^rved in a set of solutions to a given 
problem is dependent not only upon the solutions themselves but upon 
one's notion of similarity, or equivalence. Thus, -for different equiva- 
lence relations the observed diversity may be different even for the 
same set of data. In this study we are concerned -only with the four 
concepts o£ program equivalence discussed in the preceding chapter, and • 
as a result will V^ave four different definitions for diversity: diversity 
of function, diversity of algorttlim, diversity of ^=>nuivalent foms, and . 
diversity of identical forms (where;, by "identical" we mean formally 
identical, as defined in Chapter VIl). , ^ , , 

Since the measure of diversity used here is not widely known, we., 
diaeuss it briefly before describing the statistical analyses, A more 
complete and precise mathematical discussion of the measurement of 
diversity is given in Appendix Q. 
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suppose is a population that is partitioned _into k classes, and • 

that the probability that an element is in the i-th class is p^ for each 

1 = l,2,...,k. The diversity of for the given method of classifica- 

tion, is ■ • ■ . • ' 

k 

S = 1 - H P? • 

. • i=l 

The value ,of 5 wifl be be^een 0 and^l, and will be 0 only if alj. elements 
'of 2 are in a single clask ^For a;;fixld value oiLk'J^he largest value 
of B occurs when the p^'s ar^ equal,* that is, when the members of J are 
evenly distributed among the cWsses. For an even distribution, the . ' ' 
value of 5 increases with an increasing number of classes, ^ppo^ing 



. 1 as a limit . \ . \ \ 



Let S be a sample* of :size N f^m the population 2 , and let n^ be 

the observed number of occurrences of^Nthe i-th class for i = 1,2, ...,k. 

• . , n. \ ■ 

Then p. can be estimated by , and it\would be natural to define the , 



sample statistic for diversity to be "^^^ A 



^' /n \- 



It transpires, however, that as an. estimator-.^ 5, the atatiKtic d is 
biased, so we define 



*We consider only unordered samples' with replacement. 
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as .the estimator of the parameter 5. d, as shown in Appendix C, is a 
consistent, , unbiased estimator of 6. 
• '/'A little algebraic manipulatiorf will show that 

Prbm this formulation we see that d attains a maximum of 1 whenever each 
member of S is in a separate class, that is, whenever each n is 1. We 
also see that the^^nimum of d is 0, which occurs whenever all members^ 
of the set are in the same class. 

It is evident that diversity is independent of the indices used f or ^ 
'the p^'s or n^^^-. For example, any reordering of the subscripts of 
n^,!!^,...,!!^ would not change the calculated value of d. .Thus, diversity 
is invariant .under any 1-1 transformation of the indices, which is all 
that is required to assure that the formula is 'appropriate for categorical 
scales of measurement. * 

Altjiough we have shown that d is appropriate as a statistic , it 

* 

remains to be shown that this formula is an appropriate measure of 
diversity. In regards to this question the germane property of d is 
that it is the probability that two elements drawn at random will not 
be equivalent. As a result, if we remove one element from a more numer- 
;ous class and place it in a less numerous (perhaps empty) class, thereby 
increasing the diversity, the value of d is increased. 

As mentioned, the value of diversity is dependent upon the Under- 
lying equivalence relation. For the same set of data an equivalence 
relation with more grouping power will produce a lower value for d than 
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a less powerful equivalence relation. To emphasize, the dependence of 
diversity on the equivalence relation we will denote diversity of function 
by d_, diversity of algorithm by d., etc. 

Let us now turn to the data to find the amount of diversity for each 
problem using four measures of diversity, one for each of the four equiv- 
alence relations I, A, and F. In Table 20, the statistics for I, 
formal identity, are shown; the nvimber of solutions in each equivalence 
class is listed for each pro^em, and the value of dz. is shown in the 
last column. The corresponding statistics .for the equivalence- relations 
E, A, and F are shown in Tables 21/ 22, and 23* For formal identity/ 
the diversity ranges from 0.12 to 1.00 with a mean of O.76. For tomal 
equivalence the range is even larger--from 0 to 1.00--and^.the mean is 
0.56. 'Thus, the diversity of identi<;a"l forms is nearl^sSQjfc greater than 
the diversity of equivalent forms. Even with this sizable change in the 
average diversity, tt^re are five problems for which there is no dif- 
ference between L and L aod another five for which the difference is 
less than 0.1; for^these 10 problems essentially all of the diversity in 
form is due to the trivial variationis allowed under formal identity. ^ 

/\ /N /Si 

Comparing the values of and d^ problem by problem, we note that d^ 
is never greater than d^, a necessary consequence of the fact that _ 
formal, identity is included in fomal equivalence. ' . 

The average value of d^ (diversity of algorithm) i§ O.U3, only 
slightly smaller than the O.56 average for d^. For eight of the 25 
problems the values of d^ and d^ are identical, indicating a strong 
relationship between the grouping powers of algorithmic equivalence and 
formal equivalence. This is in marked contrast to the relationship of 
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either of these relations to functional equivalence. For functional 
equivalence the' average diversity is 0.15, about one-third the value of. 
the\average of d^, even though six of the values of d^ and dp are equal. 
For a more precise comparison of the grouping powers of the . four equiva- «> 
lence relations, see the correlation matrix for d given in Table 24. All 
correlations are high, as might be conjectured from the laical relation- 
ships between the equivalence relatio*hs. The values of r are all greater 
than 0.^7 and the highest value, is 0.88 for the correlation between 
algorithmic equivalence and fomal equivalence. ^ ^ 

In scanning the various values of 'd, we ,note an increase in diversity 
with problem number. This is most noticeable in the case of functional- 

Q 

equivalence but is also true for the other three relations. From this 
we conjecture that students' programs tend ^ to become' more diverse as the 
students gain experience. The relationship is far from perfect, however, 
indicating that other variables also have effect on the amount of diver- 
sity. It seems reasonable to suppose that certain problems lend themselve 
more readily to a variety of solutions. Look^ for example, at the values 
of d for problem L32-19; d^ = 0.93, = 0.80, d^ = 0.80, and d^ = 0.73, 
all of which are well above "^^i^ averages for the respective equivalence 
relations. Problems 'iil6-6 and L23-7 also-^have very high values of d for 
all four equivalence relations. The question is, do these problems -have 
similar characteristics that might account for such a wide variety, of 
solutions. On the other hand, it may be that the method of instruction,' 
rather than the problem- itself , influences the amount of diversity. Some 
of the 25 problems contained quite strong suggestions about the form a 
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Table 2h 

Correlation Matrix for Foiir Equivalence Relations 



I - 
E 
A 
F 



i 4 2 

l.OXDO 



A 

0.721 • 0.728 
1.000. ,P-88it' 
l.OQO 



F 

■■•0.1+76 
•'0.561 
O.62U 
„ 1.000 



I - Formal Identity 
E = Formal Equivalence 
A = Algorithmic Equivalence 
F = Functional Equivalence 
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.correct solution might take;. if students follow these suggestions closely 

we would "Expect their solutions to be very similar. 

• ' • ■ ■ ' ' ' ■ 

To investigate these and othel* conjectures we ran four step-^ise 

■ /\ '/\ /\ 

multiple linear regressions, using d^, d^, d^, and d^ as the variables 

to be predicted. For- independent variables we used the '10 independent 
variables* described in Chapter VI: IF, ARG, FCT, REIT, LNG, INPT, LES, 
HELP, VOC, and IfEW. Recall that these 10 variables measure characteristics 
of the problems, the curriculum, and the expected correct -solutions ^ 
(listed irr Chapter II) , and are independent of the data. 

We have already discussed, in Chapter VI, the correlation coeffi- 
cients for the various pairs of independent variables • Before 'giving the 
results of the linear regressions let us look at the correlation^ between 
the independent variables and the amount of diversity. The correlation 
coefficients are shown in Table 25. The first point of interest is that 
ttiere are a large number of quite high^ values; 11 of the'^'^ef ficiejits 
have (absolute) values greater than .5. Secondly, the signs of the 

coefficients are constant across the different defini1:ions of- diversity; 
, . . . 

from our knowledge pf the logical and statistical relationships between 
the four equivalence ' relations this is not surprising. We next note 'that 
the three independent variables that correlate most highly with diversity 
(on the average) are LES, REIT, and VOC. As we noted before, these three 
variables are also highly correlated with one another {]r\ > .6). Al- 
though these three variables correlate with diversity most highly on the 
average, it is not ^'the case that they correlate most 'highly with any 
- given measure of diversity. Both IF and HELP are more highly correlated 
with d^ than any of LES, REIT, or VOC. IF and LNG are the most highly - 
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- .Table 25 

Correlations Between Independent Vailables 
and Four Measures, of Diversity 



Variable 




Measure of Diversity 




IP 


0.3^1 


0.559 


■ 0.286 


0.189 


AEG 


0.14& 


0.286 


0.307 


0. 


FCT 


.0.262 


■ 0.242 


0/412 


1 

0.114 


REtT 


0.292- 


0.428 


* 0.568 


0.658' 


LNG 


0.189 


'. 0.575 


^- 0.416 


0.065 


INPT 


-0.067 


-0.423 


-0.239 


-0 . 204 


LES. 


0.306 


0.501 


0.566 


0.664 


HELP 


-0:335^ 


-0.355 


-0.'389 


-0.022 


• VOC 


0.235 


0.544 


0.501 


0.570 


• NEW 


-0.250 


-0.262 


-0.264 


-O.O85 
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It is also of some interest to look at the pairs for which the cor- 
relation is low- The average values of r for FCT, INPT, NEW, and HELP 
are all "less than 0.3. Also the correlation between LNG and is less 
than 0.1. l 

A more revealing picture of the relationships between the independent 
variables and the four measures of ^diversity is given by the results of 
the multiple .regressio^s. The derived linear modeler are given in Table \ 
26, and. a summary .of the ste^-wise regressions ' is given in Table 27. The 
amount of variance in diversity -accounted for- varies from^^56^^^^to^^^ the 
besF^^^jfi^trigT^or functional equivalence and the poorest for formal 
identity^ 'Thus /using the same set of Jj^jependent variables, we -obtain 
somewhat better predictions of diversity than of problem diff^cultir 
(compare Table 'l8)\ If we look only at the amount of variance accounted 
for by the first five variables to ealter the regressions, we find that 
we can account Vor over 6Qf^ of^he ^jariance for three of the four measures 
of diversity (excluding d^). " 1 ^ * 

The order in which the independent variables lentei^ed .into the re- 




ssions is different, in all four c^ses. But it is instructive to note 

^ I » 
that for dl,. and the fir^t fi^e variables to enter'are identical. 
-1^ A * ^ 

For the^e three cases the first five Wriabled are IF, FCT, REIT, LNG, 
and HELP. " Three of "these, IF, REIT, ^nd Wd, are among the first five 
variables to enter into the regression\ for d^,. ^Taking all -things into 
consideration it is probable that REIT and IF are the two most effective 
variables foT the pi*ediction of diversity. For both of these the rela- 
tionship is direct, that is, an increase in the va;iue of RElf or of IF 
win cause an increase in diversity. Thus, in predicting diversity, the 
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Table 26 . 
Linear Models 'for the Prediction of Diversity ' ■ 





X 100 = 


68 + 97 IF - 7 ARJ + 20 FCT + S REIT - 2 LNG 






- 0.2 tNPT + 3 lis - ,3 HELP - 8 VOC - k NEW. 




X 100 = 


. 2lf + 87 IF + . 3 ARG + 2U FCT +^15 REIT - 1 LNG 




r 


. ,\ _ * 3-INFr-- O.k LES - 10 HELP + 1 VOC - 10 NEW 

• 




X 100 = 


k + 6a IF + -26 FCT +, 26 REIT + O.9 LNG 






' - OA INPT + 3 LES - 8 HELP - 6 VOC - 6 NEW 




X 100- = 


- 27 + 62 rt' + 11 ARG + 5 FCT + 30 REIT - 3 LNG 


• 




+ 1 INPT + 0.8 LES +10 NEW 
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most influential variables are (l) whether or not loops or subroutines 

are required, and (2) what proportion of the commands are likely to be 

conditional. Because 'of the importance of conditionals, lodps, and sub- 
** ■ • 

routines in programming, this is not a surprising result. The third 

most influential variables in predicting diversity is LNG', the length 

of the expected, correct^solution. All three of these variables (BEIT, 

IF, and LNG) are based on characteristics of the expected correct 

solutions. To some extent they depend upon the context of the problem, 

• * but they are more largely dependent upon (the problem per se. In contrast, 

tweTof the four most influential . variable's in the predictioH of problem 

^ .« difficulty are LES and HELP,, both of which are entirely dependent upon - 

the curriculum. , 

Although LES made little contribution to the first three, diversity 
- **' regressions, it did enter fir£\t into the regression for d^, accounting 

for ^'Wlo cf the variance in -diversity of function. Thu^^, at least one 0% 
: ' the diversity predictions is highly dependent upon a- curriculum variable.' 
HELP, however, did not f'nter intu the d^^ regre^aioTi at all.. In the other 
three diversity regretjsiont,. HELE was among the first five variables but 
did not contribute as much, on the average, ar, thf other four variables 

. (IF, "FCr, REIT, LNG) . ^ • 

In siimm"&ry, IF and REIT are the two most important variables for 
the prediction of diversity and problem difficulty. Further, the pre- 
diction of diversity depends^^more upon the programming problem itself 
than upon the context in which the problem is found, unlike problem 
difficulty, which is more dependent upon curriculum context; the excep- 
tion is in the prediction of diversity of function to which LES made a 
substantial contribution. 
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We began this discussion with the conjecture that diversity increases 
with problem number. The correlations between LES and thg four measures 
of diversity (average r > .3), tend to. substantiate this view. However, 

^XX . - .. .. 

the mor^ penetrating analysis provided by the multiple regressions throws 
this conjecture into doubt for three of the four measures of diversity 
(dj, and d^) for which we can conclude that the implied relationship 
with LES is spurious and is a result of the merely statistical relation- 
ship between LES and more effective variables such as REIT and LNGv For 
the ^ fourth measure of diversity the multiple regression confirmed the. 
conjecture that LES is closely related to diversity. . We feel that a 
dyepe r a t l^ iji ' &ir lnr^t provide evidence for the 'rejection of this hypoth- 
/sis if another cu:^culum or* othef -set's of programming problems were 
chosen for Study. 

. We also conjectured that the HELP variable would be important in 
predicting diversity since it measures whether or not a nearly equivalent 
program is displayed as an example on which students may model their 
solution. Unexpectedly, the linear regressions did not support this 
conjecture strongly. For only one measure of diversity did HELP play an 
effective part. For algorlthmi-c equivalence HELP entered into the re- 
gression at the third step, increasing the value of r by 8^. For 
functional equivalence, HELP did not enter into the regression at all, 
indicating 'a contribution of less than 0.1^). Since HELP is a curriculum 
variable that is -clearly under the control of the curriculum designer," 
it might be fruitful to conduct a future experiment in which different 
treatments are used Tor different groups of students. 9 
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Before closing the discussxon of diversity we want to preserit one 
further comparison between diversity and problem difficulty. For problem 

- » ' to . 

difficulty we use Only oUr primary measure, Ml, the proportion of corf-ect 
responses given on the first trial. Following are the correlations be- ^ 
tweeh Ml and each of the four measures of diversity: * 

: -0.710 
d^ : -0.725 
dp : -0.1*63 ' 

These correlation coefficients are all hegative,_ indicating that diversity 
increases with difficulty (recall that proportion correct is inversely^ 
related to problem difficulty, by definition). Furthermore, all of the 
values are substantial; for d^, and d^, in particular, we could account 
for half of the variance in diversity from a khdwledge of difficulty. 

In one Gence these correlationo are misleading. It would be easy to 
fall into 'the trap of aaoviming that Gtudento would do better if less 
diversity v/ere allowed by the curriculum. Assuming that w(? could control 
the .amount of diversity, it ijeema likely from the rei;ultG of the multiple 
regressic^no that we could du th±c only by changing the problemo them- 
selveo rather than the way in which they were prpsented; thia conclusion 
is based on the fact that dive'^nity eeems to depend more upon problem 
variables (IF,* REIT) than uppn curriculum variables (LES, HELP^. ,Thus, 
we could decreafie the diversity, and increase the proportion correct, by 
giving fewer problems' that required the use uf concJitiuno or loops. If ■ 
we did this, would we thereby increase thp total learning of j^rogramming? 
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We do not intend to pursue this further here and have onljMtientio^ 
• ' ' ^ ' ' ' / 

the relationship between difficulty and diversity to poln't.^jLit thajfer t.he 

y 

question of. trade-offs is a complex and subtle one . that needs ^udy iri 
much greater dej)th. • ^ 
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ckmptbIr IX * 

Summary * ' 

' ■ . . ■ ' ' ' / . ■. 

' In this paper we have presented a detailed study of jk'J computer 

programs written by kO students in respons^ to 25 programming problems 
given in a computer-assisted course in programming. The 25 problems 
vary widely in. kind and difficulty. Some of them are no more than 
"finger exerci'ses" designed solely to provide the student with on-line 
practice in the u^e of newly introduced syntactic features -of the pro- 
gramming language. Other problems are sufficiently complex logically 
/that an experienced programmer must use care to arilve at a correct o 
solution. The standard solutions to the 25 problems (which we called 
"expected correct -solutions") varied in leligth from 2 to 13 commands. 
Ten of the 25 problems required the use of one or more conditional com- 
mands, ana six of them required, the use of either subroutines or loops. 
Most of the problems required a program that performed only one mathe- 
matical function but three of thorn require?d more than uno- l\inction to be 
performed on the input data. Fur most problem:: teV.t valuer, "(for input) 
were specified in the problem r.tatements; however, for eight problems^ 
either no input wan required or the otuduntn were 'freeM:o choose appro- 
priate valuen to uoe in testing their programs. 

Eleven of the problems required the use of a newly introduced lexical 
' item or syntactic feature of the language* The curriculum offered vary- 
ing amounts of guidance to the student. For four of the 25 problems,, a 
complete similar program was shown to the student ai^ a model from which 
he could work; an additional seven problems displayed a part of a program 
that could serve as a model. 
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The. above mentioned characteristics of the problems and curriculum 
context were used as independent variables for step-wise multiple regres- 
sions (to be discussed below). Other independent variables used in the 
regressions measured the position of (he problem dn the curriculum, the 
amount of AID vocabulary so far introduced in the course, and the number 
of arguments required by the mathematical function performed by the program. 

. Since students were not required to try to solve every problem, we 
did not expect to find hO (students) X 25 (problems) = 1000 attempted 
solutions. We did find that a high proportion of the students attempted 
to solve the problems — J% on the average. Some of the attempts mVde by 
students were cursory, but on the whole we conc]Aided that most made a 
serious effort; the total number of commands given by students was 7063* 
and the average number of commands given by the students who attempted 
the problem was nearly twice the niimber needed for an economical Solution 
to the problem. 

Many of the commands given by students were in error, and, in f act^^ ^ 

a large niunber i2&f)) were never executed, either because they contained ^ 

errors that caused an execution error or because the student made^Tio 
*• , 

effort to use the comrnandG. (He may, for example, have replaced the 
/ » * 

command with another before he executed hi0|||program. ) ^ * 

In studying the kinds of commands given by students, we classified^ 
the commands according to the AID verb used and found that we could pre- 
diet the proportions, of the types of commands quit^ well simply from the 

*on first tric^l <i» , 



corresponding proportions .found in our expected correct solutions. The 
amount of variance accounted for by the simple linear model was over 90^. 
When the commands given by students were classified as either direct 
/ or indirect, we found that nearly half (»t5^) of the students' commands:^ 
• V were direct; in comparison, only 28^ of the commands in' the expected 
Correct solutions were direct. 

In a study of the distribution of correct solutions, we f^und that 
57^t of the attempts were successful on firs4; trial. From Qther studies 
we know that the average for all exercises in the course is greater than 
lyfo, and thereby conclude that the set of. problems chosen for study here 
were considerably more difficult 'than the average exercise in thcT' course. 

In addition to a simple correct-incorrect syotem of grading, we used 
" a method of assigning partial credit based on the number of commands used 
in a correct or partially correct solution. We found that, on the average, 
fewer" than half of the typed commands contributed to/a correct solution. 
Of the commands that were exercuted, two- thirds contributed toward a 
correct solution; 

" For a third measure of correctness, we uD4d a correct-incoo-rect 
cl^assification in which efTroro in algebraic /fomiulas were disregarded. 
The average propoKion correct "up to" alg^'braic errors w«c 6f?^, as 
compared to the 57^ for a strict correct^^lncofrect measure. Although - . 
the correlation between these two measv/res of correctness was qviite high 
(r « .90), there were several problem^ for which the differences wer? < 
extreme. / ! 

/ • ' ' //; 

A detailed analynin of erro^ wac alfio undertaken. y|^/ found IO9O 
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overt errors in the 7063. commanyds given by students, "po- thirds of 

11+3 ' 

■ \ 

^30 



thege were syntax errors and one-third yerd semantic. ■ In looking &t 

syntax errors we found the most nimerous errors (about one-third) to be 

\ ' '-'^ ^ . - ' 

either typographical errors or ^incomplete commands.,^ We also foTind that 

a disturbingly high proporjtion of the syntax errors (perhaps 20^) were 

what we called "errors of bvergen^ralization/' that is, errors that were 

apparently caused by an ovprgeneriilizaticn of the syntax rules. These 

errors were reasonable constructions in the- sense that the intended 

meaning was perfectly clear; in fact- i^ most cases, these errbneous y 

*^ , ■ 

commands could haW been parsed by a Might ly more, sophisticated inter- 

, * ' * ■ « •• 

preter- If is knoVn that children-, in learning natural languages, over- ^ 
generalize on the rules- .that govern the syntax and usage of that language. 
Such examples range from, generalizing the rules for creation of inflections 
("goed".as the past of ''to go" instead of the irregular but^ correct "wentV) 
to structural errors and the misapplication of constructions in pragmatic 
contexts. The high frequency of the same type of error "in this^ study, 
concerned with the learning of a formal as opposed to a natural language, 
suggests that there are common governing principles for the acquisition of 
both.. An awareness of the * tendency" of students to overgeneralize from 
specific rules of syntax could enable programmers to produce high level 
language? that could be more readily learrfed. For example, if the AID 
interpreter allowed the use of multiple argments wil^h SET and DEMAND 
exactly as it did for TYPE and DELETE, many errors would have^been avoided. 

In studjrlng semantic, errors we found a rather large proportion 
(over 2ajo) of algebraic errors. Some of these errors seemed to stem 
from ignorance of the correct algebraic formulas and some from incorrect 
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translatioi).. into AID notation. Most 'incorrect translations were the- 
result of a poor understanding of the Merarchy of -operations. Dummy 
variables and their use in the definitions of functions also gave rise 
to a number of errors/ In general, logical errors, either in IF clauses 

V - • 

or in the sequence of execution, vere fewer than anticipated. From this 
evidence we concluded that in all likelihood students are less mathe- 
matically .sophisticated than pres^umed by the curriculmn; consequently, 
in a subsequent revision of the course we included more instruction in 
mathematics and delayed the introduction of user-de^fined functions until 
quite late iiv^the course.; A future (^omparison of semantic errors for • ^ 
the two versions of the course would be needed to establish the accuracy 
of our conclusion that much Of the difficulty is curriculum-oriented and 
can be controlled by the curriGU^um writer. ^ 

The descriptive statistics ment-ioned above ,were used in defining I9 
different nfeasures of problem, difficulty. Three were measures of pro- 
portion correct. Ten were measures of number of errors and errbr rates, 
for both syntax and semantic errors a^ well as total, errors. Five " ^ 
measures of problem difficulty were measures of ^the effort expended and 
the final measure was the proportion of students who attempted the problem. 
In comparing these I9 measures we found that some pairs, such as the first 
two measures of proportion ' correct, were highly correlated-, but there were 
many pairs for which the correlation coefficient was essentially zero, 
leading us to conclude that the measurements are along ^several different 
dimensions of problem difficulty. 

We selected six of the 19 measures of problem difficulty for more 
intense st'udy. ^ These were (1) the proportion correct on first trial, , 
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(2) the proportion correct up to algebraic errors, (3) the syntax error 
rat^', {k) the number of students who made semantic errors, (5) the ratio 
of commands typed to the number of commands needed for a correct solution, 
and (.6) the number of students who attempted the problem. Except for the 
first two of these six* measurers, the correlations between pairs were jjuite 

% * it ' 

low.. By means of step-wise^ multiple linear regression^ we de lived' line aT" 
models that predicted problem difficulty ^from measurable characteristics 
of the problems, the standard solutions, and the curriculum context. The 
same set of 10 independent . variables was used in each of the six regres- 
sions. These 10 variables measured such characteristics as the expected 
proportion of><fonditional comniands (IF),, the location of the problem in 
the cu;prxcul\am (LES), the amount of guidance offered by the curriculum 
(Help), whether or not loops or subroutines were required (REIT), the 
length of an economical correct solution (LNCj), etc. For four of the six 
selected measures of problem difficulty, the linear models, derived by the 
.regressions were quite satisfactory, accounting for two-thirds or more of 
the variance. The best fit was for the proportion correct up t^ algebra, 
which we also felt was the best measure of programming difficulty per se";' 
this model accounted for 85^ of the variance. The two linear models that 
predicted the syntax error rate and the number of students who made se- 
mantic errors w,ere less than satisfactory; both o£ these models accounted 
for less than half the variance. The independent variables entered into 
the regressions in different orders for all six regressions. However, 
on the average, it appeared that the most; influential variables in pre- 
dicting problem difficulty were (1) the position in the curriculum, (2) 
the expected proportion of conditional commands, (3) whether or not loops 
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or subroutines are required/and {h) the amount of guidance offered by 
the curriculum. The firs't and fourth of these can be characterized as 
curriculum-dependent variables, whereas the other two are problem- dependent . 

We next looked more closely at the kinds of correct solutions pro- ■ 
duced fcy students. The' 551 correct or nearly correct solutions given on 
first trials were classified according to four sets of criteria. Two 
methods of classification were based on the formal, or static, charac- 
teristics of the solutions, and two were based on functional, or dynamic, 
characteristics. These four methods of classification were referred to 
as formal identity , formal equivalence , algorithmic equivalence , and 
fu^ctional equivalence s We defined two programs to' be formally identical 
only if the differences .between them were such minor differences as the, 
use of different letters for variables or the use of different part 
numbers for naming programs. Less trivial formal variations were allowed 
for formal equivalence. For algorithmic equivalence, we considered only 
the sequence of actions of a program and disregarded its form. The last 
equivalence relation, functional equivalence, was defined solely in terms 
of input and output; both the form of the program and the sequence of 
internal states were ignored. 

The four equivalence relations exhibited considerable variance in 
their grouping power over the data. There were an average of eight types 
of solutions per problem when the solutions were classified by fomal 
identity. Under formal equivalence, the average number of types was five.* 
Under algorithmic equivalence, the average was 3.6, and under functional 
eqLui valence, 1.7* 
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Using the four equivalence relations discussed? above, we defined 
four measures of diversity of solutions: diversity of function, diversity 
of algoilthm, diversity of equivalent fonns., and diversity of identical 
forms. The measure of diversity we used is akin to variance but, unlike 
variance, is appropriate for Qategorical sciales of measurement* This 
measure — or rather, its unbiased estimator — is given by the foimula 

' ' ^ ^ ^ "fe N(N-l) 

where N is the total number of solutions, n is the number of solutions 

X 

in- the i-th equivalence class' (or, of the i-th type), and k is the number 
of equivalence classes (or types). As this measure of diversity is not 
widely known in psychology, we have included a precise mathematical dis- 
cussion of its derivation and properties in Appendix B. 

Once again we used the tool of the step-wise multiple linear regres- 
sion to define predictive models for diversity. For this, we used the 
same set of 10 independent variables used in the prediction of problem 
difficulty. All four of the models accounted for more than 50^ of the 
variance in diversity; the best fit was for diversity of function, in 
which 80^ of the variance was accounted for. In examining the order in 
which the independent variables entered into the regressions we concluded 
that the three most important variables for the prediction of diversity 
are (l) whether loops or subroutines are required, (2)r the expected pro- 
portion of conditional commands, and (3) the expected length of the 
solution. All three of these variables might be characterized as problem 
variables rather than' curriculvjn variables. This contrast to our findings 
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for the prediction of problem difficulty, in which two of the four most 
important variables were curriculum-dependent, leads us to conclude that 
problem difficulty could be more easily manipulated by the curriculum \ 
designer than could diversity. Another comparison of interest between 
the two sets of predictive models is that there are two independent 
variables ' that contribute largely to both. Thepe two variables are .(l) 
whether loops or subroutines are required and (2) the expected proportion 
of conditional commands. In view of the importance of conditionals, 
loops, and subroutines in programming, this is an intuitively satisfying 
result. * 
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V 

APPENDIX A ■ { 
The Programming Language AID 



The subset of AID that is described herein includes that part of 
the language that is taught in the course "Introduction to Programming: 
AID." The following description is an excerpt from "100 Programming 
problems . "* 



*Friend, J. E. 100 Programming Probleips (with a description of the 
programming language AID) . Institute for Mathematical Studies in the 
Social Sciences, Stanford University, September 1913' 



AID Commands and Prugtama . ■ ^ 

* 

Nmbers and Algebraic Expressions 

Algebraic expressions In the programming language AID follow ordinary- 
algebraic notation quite closely. The letters A, C,". . . , Z are used as 
variables, and the following symbols are used for arithmetic operations and 
grouping : 

+ addition 

subtraction ' 
ti • ' . 

> multiplication 

/ division * 

t exponentiation 
» \ ' 

I ! absolute value ^^^^^.^'^ 

( ) parentheses ^ — 

In forming algebraic exprestsions, Juxtaposition cannot be used to Indicate, 
multiplication; the expressions Px and xy must be written as 2^X and XJ^^Y in 
AID notation. Algebraic expressionfi must be given as a linear s>«.ng of 
symbols, which preclude. i the use '^f nvj horixontal Sa,r Indicator of dlvi- 
sign; ^ must be written VB, an J mu::t be written a',. (A+B)/(A-B). 
Neither can sub^criptc or saperjcript-, be uijed; is written as X(l) and 
y is written as Wd. 

GTouplng is indicated with parentheses just as in ordinary algebraic 
notations^ and parentheses may be Imbedded as desired. If parentheses are 
not used, arithpietic operations are performed in this order: 

t 

X- and / from left to right 

+ and - from left to right ^ 
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\ ■ / , . 

. t - 

t 

\ 

Th^s, exponentiation is always done' first (unlese'^'piarentheBes are^ used to 

indicate otherwise), tljen either * or /, and. finally either + or -; If two / 

operations with the same order of precedence appear, they are evaluated in 

left-to-right' order; .in the expression )(/Y*Z/W^. the first operation to 

performed will be TS/Y. 

. AID numT^ere may be "written in integer; form (275) or In deolnal toxm 

(537, 0*01, .72). Numbers are limited to nine 'significant digits and must 
100 

be less than 10 in absolute value Numbers may also be written in a Toxm 
of scientific notation that is direct translation of ordinaiy sclentlTlc 
notation. For example, 2.3076 X 10^ is. written as 2.3076 * 10t5»^ Sinfie 
the slaah .(/) is used to indicate division, an expression like 2/3 is read 

as "two divided by three" rather- thatt /"two* thirdfs." Because of this, an 

2 ' 2/*^ ' 2/3 

Expression llJce Xt2/3 means x 7 3/ .not x ' } to wilte x ^ in AID notation, 

•? 

use Xt(2/3). • . ' . ' . . ; 

Negative 'numbers are indicated l3y a minus sign: -2.7. When negative 
numbers etre used in certain combinations, such as 2 ♦ (-3)^ "the negative 
nmber must be enclosed in parentheses; to be on the safe side, always use 
parentheses around negative numbers. 

The variables A, B, C,..., Z may be used for numbers, as indicated above. 
They may also be used as indexed (subscripted) variables to iden^tify lists 
. of numbers or arrays of numbers. The list x^^, x^,..., x^ is written in AID 
notation'as X(l), X(2),..., X(N), and the entire list is then referred to 
simply as X. A two-dimensional array (matrix) of numbers may be identified 
by a variable using two indices: a is wiltteh in AID as A(I,J). Up to 10 
l^ndice's may be used (for up to lO^-iimenjaional arrays).^ Indices may be given 
'^as numbers, ^ ^ilabies, or algebraic expressions; "'X(l2), X(lJ,J)', and ^ 
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X( 2*1^3/ JA)*' Regardless of how 't He indices are indicated they must have 
integer yalues and ar^ limited to -250 to 250, including zero. Thus, the 
longest list of numbers has 50I members: X(-250), X(-2ll9),,»^,.X(-l), X(0)^ 
.,x(l),..., X(2ll9)>X(250). A two-dirfensional array could have 50I X 5OI 

members^ etc« , ' ' » ' ' ^ ^ 

I* " / ■ ' 

To aummarize^ here are some examples of algehti^Q expressions and th^ir 
AID equivalents; o / y 

' . 5*Xt2 + 3*Yti/ . - 

.Zt(V2) or ZtO.5 ■ ■ - , 

Ix-yl' " ■ ' / . \ 

X(l)'+ X(2:i - X(^) 



2 h 

5x + 3y 
_V2 



^1 ^2 



X, 



a+b+c+d 



(A+&fC+D)A, 



In general, spaces jnay be used whenever dQsired in algebraize expressions. 
The expression 5^X+U may ajjo be written 'jtx 4 4 ox 5 ^ + ^ 
The iexceptions to this rule arc in indexed variables and^ as we shall see 
latter, in function notation. ExpreseionG like X(5') and A(l,2) must be 
'Written without a space between the identifier and the opening partJnthesis; 
/' (5) or A (1^2)' will cause' an error meBsage, 
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The Form of AID 'Commands 

' ' AID commands a^e /quite sijnllar to English commands : 

TYPE X y . * 

• ,3ET Y H T . _ 

STOP 

Each 'command Ijegins with a verb (TtPE,.SET, STOP) and the foim of the rest 
of, the command depends upon the verb that is used. The yerb TYPE, for ' _ 
example, may be fpUowed by any algebraic expression (and the result wiU ^ 
be that the' expression is evaluated and the value typed on the user's tele- . 
• 'typewrfter) ; ' 

TYPE X - 2 . ° 

TYPE S/e^+S/T) * • . 

■ . ' ' » * ■/ 

TYPE XtS + Yt2 ' . ■ . • ■ 

, / ■ «• 

Some commands, like STOP, may consist of only ope word,- but mosrt commands 
have either variables or algebraic exi)ressions or equations or other kinds 
of argument^ following the verb. Some commands .also have optional modifiers, 
; which are phrases trhat can be added to the command to. modify its meaning. 
For example, the TYPE command may be modified by an IN FORM phrase: 

TYP^l X, IN FORM 12 ' * >' 

where Form 12 specifies the form in which X is to be typed. (This will be 
explained more fully below.) 

Witfe one excefption (FORM), AID commands must be given in one line; a 
line is terminated by the user by, typing the return key on the teletypewriter 

-There are. two kinds of AID commands : direct commands and indirect com- 
man4rf.- " Direct commands will be executed as sxjon as they are given, whereas 
indirect commands are stored and will not be executed until the user gives 
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an order to- do so. Many AID commands may be used as either direct or in- ♦ 

dixtect commands. To indicate whether a ^command is to be a direct command/ 

or an indirect command, "step numbers" a're used before.^ndirect' commands: 

l^t TYPE 15/16 + V32 _ ; 

This command will be stored rather than, executed immediately, and the step 
number may be used in later references to the command. When the user wishes 
to have the command executed, he gives a DO command like the following: 

DO STEP 12.7 

9 

Step nvunbers are decimal numbers between 1 and 10 , and, lilc6 all 
nvunbers, are limited to 9 significant digits. 

When indirect commands are stored, they arfe grouped into "^arts" 
according to the, integer portion of the step number, ^.^gommands numbered 
23.2, 23.7, 23. 81;, and 23.001 are all grouped together into "Part 23." 
Indirect commands may be executed singly: 

, DO STEP 23.2 
or they may be executed in groups:/ 

DO FART '^3 - . . ' -" 

When the above command is given, all the steps in Part :,:3 will be executed, 
in numeric order. When Part 23 is exhausted, the execution will cease; even 
if there are step£3 numbered 21^.1, 21^.2, eJtc. , execution will not automatically 
proceed to Part 2k. A-^et of stored commands, to be executed' as a group, is 

called a "program." A program may ^consist of a single part or, by the use 

■ ■} 

of branching commands as explained below, several parts.. 

Although most AID commands can be used either as direct •commsnds or in- 
direct comraandB, there are a few that may be used only in one; fom. Tablp^ 
lists the AID commands and shows which can be used.'directly knd whicjar, indirect 



Table, 1 * ' 
rect and Indirect' AID Commands 



/ .May be used May be used 

Command / directly indirectly 



DELETE 


Yes 






Yes* 


DEMMD , 


No 






Yes. 


DISCARD 


Yes 






Yes* 


"do 


Yes 




• 


Yes* 


FILE. 


• Yes 






/ ■ 


J? UXUu 








Yes 


GO ■ 


Yes 




/■ 


No 


LET 


Yes 




/ 


Yes 


REpALL 


Yes 






Yes* 


SET 


■ Yes . 




. • A' 


. Yes 


SET (short version) 


Yes 




; 


to 


STOP - ' 

s 


No 






Yes 


TO 


' No 




\ 

A 


Yes 


TYPE 


Yes 




/ 


Yes 


USE 


Yes 




• 

/ 


Yes* 






/ ■ 




^Rarely used in tiie 

■J ^,4^ 


indirect 

0 
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Basic Comm^ds; SET. TYPE. DiEMAMD^'TO. and DO . ■- . • 

The five commands SET, TYPE, DEpND, TO, and DO foim the core of a 
basic AID vocabulaiy. Together vithyhe algebraic expressions described 
above, a fev atandard AID^ functions, and the conditional clause described 
in the n^xt section^ these five commands are sufficient to solve any of the 
100 problems given in this bpoklet. ~ ■ 

The 6BT command is used to assign a value to '^variable: 
SET X = 12.7 \ , . r 

. ^. SECT K^= 0.002305 . > " ^ 

, SET"M = K^Xt2 

' , The algebraic, expression used on the right of the equ§.a^ sign may con- 
. tain one or m'Dre other variables, but all of th6 variables used must have 
.values so that the expression can, be immed^ely evaluated. When a SET • 
command is executed, the , expression ' on the right of the equal sign is evalu- 
■ .ate d and that number is stoAd in temporary (core) storage with the spepifie^ 
.identifier (the- variable used oh the left of the equal sign) ; that stored 

•number may thereafter be'' referred ^o by it§ identifier. A SET command may 

' V ' 

be used to "define a variable in tems of itself." The result of the ^ . 

I ^ ^ ■ ' ' 

folioving ^jbquence of co^nds would be that the number 7 is stored as'M: 

- . SET N = 13 sets N equal "Co 13* 

SET N =■ N + 1 ad4s 1 to the current value of - N. 

' ' SET M = 'N/2 divides the current value of N by 2, 

■ , .J * 

' ''set may bed .used either indirectly (vith a step number) or directly. 

If used as a direct command, the short. form vhich omits the vord SET ma^ 
. be used: ■ , . . . ' ' . 
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X = 7 equivalent" to SET X = 7 . .o 

K = 0.07835 equivalent to SET K = 0,078305 

SET may also be used with indexed vai*iables: 

SET X(2,3') = 7 sets the element X^ ^ ^^^"^ array X 

e<iual to 7 . 

L(5) ^ 72.31 L^. equal to 72, 31 : ■ 

The TYPE command is used with an algebraic, expression: 
TYPE (X+K*Y)/3 

Here again the algebraic expression must contain only variaVles that have 
values (or will be given values before the TYPE command is executed) . " When 
a TYPE command is executed, the value of the algebraic expression will be 
calculated and typed on the user's teletypewriter. 

A TYPE command can be 'given with several arguments, separated by commas: 

TYPE X,Y, (X+Y)/2 • ^ 
This command is equivalent to the three commands : 

TYPE X 

TYPl? Y 

TYPE (X+Y)/2 

Caution: Only two commands, TYPE and DELETE, allow multiple arguments; other 
commands, like SET and DO, use only one argument. 

The TYPE Command can be used to type text by giving the text enclosed 
in quotation ijiarks: ' ■ 

TYPE "TITLE: COMPOUND INTEREST CALCULATIONS" 
Other. use-s of the TYPE command will be described later. 

The DEMAND, command can only be used indirectly (as a stored command) : 
2QA demand X ' . 



The DEMAND command uses a single variable as an argument, ^and the result 
of such a command is to cause the program to halt, type 

wait for the user to "type a value for X, and then contihue the .execution^ of ' 
the program. By using DEMAND commands, a program can be written so as to 
ask for the data it need§. A useful variant of the DEMAND command is fomed. 
by, appending the modifying phrase AS "text." The command 

: .17.9 DEMAND R AS "INTEREST RATE" , * * - . 

will cause the program to stop at Step YJ^9j "type - ' - 
INTEREST RATE= 

7' . ' . 

and w^it. for the user to' type a Value which will be assigned the identifier R. 

• ■/ . 

A feature of the DEMAND command that is frequently useful in iterated 

< ■ 

programs is that if the user refuses to give a value for the DEMANDed variable/ 

and responds simply by typing the return key, the executi'Dn of the program 

■J 

will halt at that point; thus, seemingly endless loops can be used if they ' ^ 
incoarporate ^DEMANDS. • ' _^ • 

. / DEMAND is used solely for input; SET is used for both input and for 
internal computations, and TYPE is used for both computation and output. 
Here is an example of a complete program using all three of these commands; 
■ . k.l TYPE "COMPUTATION 0F INTEREST AT i^.5'J^"' 

h.2 SET R = 0.0i^•5 

0*0 

DEMMD P AS "PRINCIPAL" 
1^.1+ SET I = R * P 

V.5 SET 'T = P +• I * ' 

4' 

U.6 TYPE I,T ■ _ 
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This program would .be executed by the - command 
^ ' DO PART k :'■ ' ■ - 

'and it would start by typing 

' » • . .' COMPUTATION 01" INTEREST AT 

, PKENDIPAL = _ " ■ 
As soon as the user typed a value for P, say 200, the program would reply 

1=9- 
T = 209 

A3 mentioned, the' steps within a^part are ordinarily executed in numeric 

prder. This order can be overridden l^y the .use of the branching command, TO. 

^ 'to, like DEMAND, can be used only as an indirect command. A TO command may 

be used to branch to ei-^her another step (within the same part or in some 

-other part) or to another part: 

' 6.3 TO STEP 7.29 will cause execution of Part 6 to cease and 
/ execution of Part r7 to commence at Step 7-29' 

l6.k2 iO PART 8 will cause execution of Part l6 to cease and 

execution of Part 8 to commence at the lowest 
numberted step. ^ 

Although a TO command may be used unconditionally, as shown above, simply to 

alter the linear sequence of execution, it is more often used conditionally, 

that is, with an. IF clause, as will be explained in the next section. 

Several examples of .direct DO commands have been given above. Us^d 

directly, DO causes the execution of a specified step or part : 

.DO STEP 7.35 

DO PART 8k' 

DO may also be used indirectly, as part of a program, to cause the execution 
of another part as a subroutine : 
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. 7,1 SET P= 3.1^159 

7.2 SET R = 15 I ^. ■ " .' 

7.3 DO PART 12 I 

'7.1+ TYPE A . r 

In- this program Step 7 .3 calls* for the execution of Part 12. Part 12 is the 
"subroutine" and the ;.D0 command in Step 7-3 is the "subroutine call." When 
Part 7- is execiited,.»the Sequence of execution-is: 
■ Step 7.1 

Step 7.2 ■ . _ ■ 

Step 7.3 

All of Part- 12 ' 

Step" 7 A , ■ , 

Thus, DO as well as TO ' cm be used to override the- automatic linear sequence 
of execution. The primary difference is that DO calls fol- another step or - 
part to be inserted into the part being executed, whereas TO calls for a • 
complete transfer of control' to the part specified. ■ Here are foUr sample 
commands, with comments, to summarize the difference between. DO and TO. 

3.6 DO PAfeT 7 will cause all of Part 7 to be executed, followed 
by- the execution of the remainder of Part 3. 

3.6 TO PART 7 will cause all of Part 7 to be executed. . Execution 
will halt at the end of Part 7. The remainder of 
Part 3 will not be executed automatically. 

3.6 DO STEP 7.5 will cause Step 7.5 to be inserted as a one-step 

subroutine. After Step 7.5 is done, the remainder 
of Part 3 will be executed. No other steps in 
Part 7 will be done. 

3.^ TO STEP 7.5 will cause execution of Part 7 to start at Step 

• 7.5. Execution will halt at the end of Part 7, 

and the remainder of Part 3 ''ri.l^^not be executed 
automatically. « . ' 
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' There are two modifiers that may be used with DO commands : TIMES and 
FOR. The TIMES modifier is used to specify the number of times the required 
step or part will be executed: 

DO STEP 3*5, 6 TIMES . * . 

■ " l-i^Z DO PART 12> N TIMES . 

The number of times a step or part is to be iterated may be specified by a 
number or a variable, or even an alge'braic expression, with the stipulation 
that tlie value is a positive integer. 

The second modifier, the FOR. clause, specifies values for some variable: 
DO PART k FOS X = 7 
This command is equivalent to the two commands 
^ SET X = 7 

DO PART ' 
A list of values may be given in the FOR clause if desired: 

fiO PART FOR Xz: 7, 23.8, 19 
This command will cause Part h to-be done three times, once for each of the 
listed values for X, and is thus equivalent to the six commands 
SET X = 7 • ■ 

• DO PART h 

SET X = 23.8 

DO PART h ' 
SET X = 19 
DO PART 1^ 

The^values for the', variable may be given in the form of a "range specification;" 

I 

as in this example: 

DO PART 21 FOR A ^ 5(2)13 

13 

,9^- : ±'70 



The range specification 5(2)13 indicates that the initial value of A is to 
be ^ and that A is to be incremented by 2 with each successive iteration - 
until the value of 13 is reached. That is, A will take on the values 5, 1, 
9, 11, and 13- Any or all of the initial value, the size of the increment, 
and the final value mdy be given as algebraic expressiqns, and they need not 
be integral. The command 

DO STEP 7-3 FOR Y = 3.2(.2)if 
is equivalent to 

DO STEP 7-3 FOR Y = 3-2, 3.^^, 3-6, 3.8, 1^ 
When values of a variable are given in a range specification, the final value 
is always used. Hence, the command 

DO PART 2- FOR X = 0(2)7 
will cause these values of X to be used: 0, 2, 4, 6, 7'- 

DO commands with either TIMES or FOR modifiers may, of course, be uBed 
as indirect steps to cause iterated execution of a subroutine. 
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The liv Clause 

Certain modifiers, such as the AS ot TIMES phrases, may be used to 
modify specific commands. There is one modifier that may be. used with any 
AID command, and that is the IF clause. The addition of an IF clause change 
any command from an "unconditional command" to a "conditional command," 
Here are a few examples: 

TYPE X/Y IF Y > 0 - . 

3.2 DEMAND H IF T = A + X 

7.3 DO PART 8, 3 TIMES IF X -S* Y + 3 
SET Z = X/(Q + S) IF Q + S > X 

An IF clause contains the "word IF followed by a Boolean.expression, Boolean 
expressions (also called logical predicates) exp|ress relationships between 
numbers. The following relational symbols are used: 

< less than 

> greater than 

<b less than or equal 

>= greater than or equal 

= equal 

# not equal ^ 
As in ordinary usage, any algebraic expressions may be used In Boolean 
expressions: 

'X < 0 

X + Y t ^ # Z 

2 > = Z 

The Boolean operators AND, OR, and NOT may also be used: 
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NOT X < 0 . ■ ' ■ / 

■ ' ■ ' ' / 

X -< 7 AND Y > 8 
X > 0 OR X < Y - 2 

X # 0 OR y # 0 OR Z # 0 . . 

(A + B > 0 OR A < 7) AND B > e 12 . ' 

In evaluating Boolean expressions, the Boolean operators are evaluated In^ 
this order (unless there are parentheses to indicate otherwise): 
..NOT ■ . 

AND ° " ■ 

■ OR ^ 
When a conditional command is executed, the exe6\ition proceeds in two 
phases. . First, the Bcsolean expression ifeed in the. IF clause is evaluated 
to deteimine whether it is true or false. Second, if the Boolean expression 
is true, the main clause will be executed. , 

Any command may be modified by an IF clause. One of the most important 
uses of the IF clause is in TO commands; a conditional TO command iB- called . 
' a "conditional branch" and is the principal mechanism used in writing non- 
linear programs, including those with loops. As an example, here is a simple 
program with a loop (this program simply counts f rom- 0' to 30 by twos): ' . 
5.1. SET C = 0 . - 

\ 

5.2 TYPE C . 

5.3 SET C = C + 2 

5.U TO STEP 5.2 IF C < o 30 
5,5 TYPE "THAT'S ALL." 
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Auxlllaiy Commands; FOm, LET, and^BEEETE ^ ' ' 

Besides the five commands (SET, TYPE, DEMAND, 'TO, and DO) that are used 

in writing simple programs, there are a niambe.r of auxiliary, commands that are 

ordinarily used as direct commands. Two of these, FORM and LET, are to de- 

fihe forms and functions that will be usedf^ by TYPE and SET commands }.n' 

programs, and are thus closely associated with the programs themselves. The 

.other auxiliary commands are used more for bookkeeping or debugging purposes} 

^y{heQe are DELET5, the fia,e commands.^ be discussed in the following: sect ion, 

and the debugging commands to be discussed in the sec-^ion after that. 

FORM and LET are used' in conjunction with storeci programs. FCJRM is 

used to specify the format to be used for output. Ordinarily, when. a TYPjS^ 
• r ' 

command is used, the output ip pilntedf in a standard form. For example, 

: • . A, , • 

when the command . r'\ ' ^ . 

•» '* » 

*** 

TYPE (X + 2)/y ' ' • ■. ; • 

' ' ' • " 

is given, the value will be typed in this form: " . ' 

(X + 2)/y = 28.7" • * - ^ . ' 

If a nvimber is 10 or 'greater 6r if it Is less than •QOl, it will be typed 
i*n scientific notation rather than decimal form: x » . 

(X+ 2)/y 2.8?'* iot(-^), • 

(x + 2)/y « 2.87 * lot8 
If the user prefers another form for output, he may specif^ it in a FORM 
statement. The FOiW statement, unlike other AID commands, requires two 
lines; the first line specifies the form number (an ^integer between 1 and 
10^ to be used in later references) and the second line specifies the form 
itself: ' • ' , . 
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FORM 12: ' _ - , . * 

r. ■ ■ 

THE INTEREST IS *" *" . « *■ . ' ' • 

' - ■ ^ ' ■ , ' ' ^ - ! 

The location' of digits ie indicated by the^character ani,; the. position of , 

. the decimal poln.t is shown by' a period- When th^ fom specified above le; to 

■ be used, the TYPE command is modified by an IN FOBM, phrase : ^ < 

* TYPE P * R IN FORM 12 
Numbers will be rounded -to f it »the specified fom (which is the easiest 'way 
of rounding numbers to a fixed number of decimal placeb) arid' if no decimal , 
point is specified, the number will be rounded to the nearest integer. When 
specifying, a foira, care must be teken to alloV'for as many digits tefdre ifhe 
decimal point as will be necessary; if an attempt is made to type, a number 
in a fom that is 'not layge enough, an erroa? Aiessage will^result. If the . 

*nUmber to 'be typed in a given fom is negative, one of the digit locations 
^ will be taken up by.^ the negative sign. 

Any symbol^, including ^pu'nctuStion marks., may be.u'sefe in the text of 

.ja form: « " . . ** • * . 

, FORM . I . \ . ^ 

° PRINCIPAL +<, INTEREST = $ ,*-*- . *■• *" . ■ . ' 

No t^xt is necessary if 'the user wishes merely to print a number in a 

♦ ■ « 

giveil form and location. 

More than one ntimber may be provided for, and this is the only way In 
which more Ihan one number can be/pcinted on the same line: . , 

FORM 6 : ' ■ . ' o . 

^ ^ ;^LL EARN $ IMTBREST. 

To use a form with s;everal numbers, 'the mult iple- argument form of the TYPE 
command is required: ' 

^ 18 ' • 

o • 1.75 
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TYPE P. P^ R IN FOEM 6 
The LET. cdmmand is al^ used ill conjunction with stored programs, but 
may be used independently for direct compilations. The primary use of LET 
Is in- the definition of functions^ The function f(x) = 3x + 2x^is defined 
in AID as follows : 

LET F(X) = 3*Xt2 + 2^X ; ' ' y \ . \ 

When the function is ufeed, in a SET or TYPE command, ^ value is snbstituted 
for the dummy variable X in the expression F(X) : , . 

SET Y = F(3) / 

TyPE FC5) - F(3*7) . • 

The value that is substituted may ' be in the form of an algebraic expression, 
pro^WLded such an expression can be immediately evaluated > 
. §ET N = 2 

ffYPE F(N/6) 

Any of the variables A, B, C, Z jnay be used as function names. 
Take care, however, not to use the same identifier for both a real variable 
and a function since the first definition will be i^placed by the second. 

Functions of up to ten variables hiay be defined; here is an example" of 
a function of th^ee Variables : ^ . 

LET.^F(X, Y, Z) = (X^Y Y*Z)/XX-.Y*Z ' 
Cautionjj Do not use a spfiice between the function name and the opening paren- 
theses; an expression like F (3) vill cause an error message. 

A useful, variant of the LET command is the conditional form of LET used 
to define functi-cJns conditionally. In ordijiaiy notation, k function may 
sometSmeg be «>defin^ in this fashion: 




fix)- 



■2x if X < 0 
5x if X >'.0 
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'In AID, this clef inition.- is givep, ^ single line: 

■ IJST F(X) =, iX,<-0::'f'2*Xf X'>> 0: 5*^^ - 
which is read ■•If ■.x :< o', ^{xl' ^' -2x;' if x > 0, f(x) = 5x.". In the Alb 
definition, the" entire^ expression • is enclosed in parentheses, the clauses 
within 'the definitidn are , s'eparated by semicolons, and each clause is divided 
into a condition'and an a-lgebraic expression separated from one another by 
a colon. Any number of clauses may be used; in the above example, there 
are two clauses. , < " 

If the definition of a function is given in ordinary tems with an 

"otherwise" clause, , - . 
,f(x) 



0 if x_^ < 0 

2x if x" > = 0 aii.i X <■ 7 
^ 5x otherwise 




the AID definition does not require a condition in the final clause : 

LET F(X) r (X <■ 0; 0; X > ■ 0 ATf i X 7 : "*'f '> 5*^-) 
In this example, the final ^aut;e consia tr, only -of *he algebraic expression 
5*X, which will be used whent^ver ttll of the comJitionr, -in preceding cla;y^ 
fail. 

When a function definition iu uoed, it. iii, ycan^^ from lef^ to right 
until a condition that holds is foundl Because of this, it is frequent3.y. 
possible to simplify All) definitions. 'For example, the condition in^e 
second clause of the above example could be simplified from X > ° 0 AND ^ 

X -< 7 to X < 7 : - 

LET F(X) - (X < 0: 0; X < 7:. 2*X; 5*X) . 

177 * 
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A fwictlon^'ma/ iCjall itself; hence, a variant of .the conditional defini- 

ti9»r^is definition by recursion. Here, for example, is the AID recursive 

./definition Of/the factorial function XI , 

^ " ^IT F(X) = (X:= 1: 1;"W(X-1)) 

m/jET and FORM serve to store information in Core storage. In the 

ca^ a function definition is stored and in the. other the definition of 

an^piitput form. SET and DEMAND' also use core storage; *both of these cause 

ynvunber and its identifier to be stored. Stored commands (indirect steufi) 

. arfe also put into core storage, as clued by the step number precedingyche 

/command. In pi^ogramming it Is often necessary to inspect the infonaation 

^ that" is being held in core dr to delete som i to©. The contents ofcOre^^n 

- / • / ■ 7' 

be displayed by using TYPE commands . and deleted by means of DJ3LETE commands . 

Some ^^exampXe of such TYPE anii DELETE commands are given h^, with comments: 
TYPE X 



DELETE X 
TYPE X(3) 
DELETE X(3) 
TYPE FORM 3 
DELETE FORM 3 
TYPE STEP 7.1 



will print the value of X If X is a number or 
a list or array, or the d^inition of X if X 
is a function. 

will delete Either a liiutfiber X or a function X. 

will pilnt the val^e of X,. 

/ i 

will delete the single value X^ from the list X. 

will type the .definition of Form 3- 

will delete the definition of Form 3* 

will pHnt the otored command -identifded as 
Step- T.lN-^^ ^ 



DELETE STEP 7.I will delete Step 7.I 
TYPE PART 29 



will print all of the steps in Part 29' in 
numeric order. 



DELETE PART 29 will delete all of the steps in Part 29. 
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• TYPE ALlI " , will pririt the entire contents of core. 

DELETE ALjl will delete everything in core storage. 

TJPE ALL VALUE^^ wii:^ print all ^umbers\ liste, and arrays, 

TXPE ALL FORMULAS will jy:lnt ^^ilL^^function %efintt^^ 

' TYPE 'all steps ' . . '^^f 

TYPE AIJj PARTS " 
TYPE ALL FORMS ^ 



> 



DEIiETE ALL VALUES 
DELETE ALL FORMULAS 
' DELETE ALL STE.PS 
^ DELETE ALL PARTS 
•DELETE ALL FORMS 



Both TYPE and DEtETE may be used with several arguments, separated by 

■ ■ . 5^ - — ' " ■ ' ' 

commas : 

. ^ , TYPE X, STEP 3.7; F 

0 DELETE STEP^Bi?^ CART 9, K, F / ' 
These a're the only two A^D commands that have multiple -argument forms, 
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File. Commands; USE. FILE. RECALL, and DISCARIi 

Anything that is stored in core ^11 be automatically deleted whenever' 
the user signs off/ Any or all of^this infomation can be copied to more 
peimanent storage space on the disk. To do this, the file commands USE, 
PILE, RECALL, and DISCARD are use^d. AID files are variable length disk 
files / identified by. integers from 1 to 2750^ Thfei files need not 'be used 
in numeric order and the user specifies which file he wants to use by giving 
a command like . ' ^ 

USE FIIiE- 106 ^ ^ " - 

• The file number is held in core ^until another USE command is given (or until 
the user signs off )^ and all subsequefat FILE, RECALL, and DISCfARD commands ^ 
wi^l refer to this file. . , , ' . » 

\ Each^rfile is divided into "items," numbered from 1 to 25, and the user 
/""''.,' 

* must specify the item when storing or retrie^ng infomation. Items need 

not be u6ed in numeric order. Ta file an item, a command like 
FILE PART 7 AS ITEM 3 
\ is given. The user may file a fom, a step, a part, a value, a function 
definition, gr all of these, using commands similar to the TYPE and DELETE 
command ^hovn just ^bove. The e^-fi^ contents of core may be stored as a 
single ,i tern byrgiving/a command like 

' FILE ALL AS ITEM I7 . ' . 

When infomation is filed on the cf'isk, the contents of core are not dis*- 
turbed; a copy is made for transfer to the disk* 



• When the user, wishes to re^trieve infomiiation from the file, he uses a 
command like • - ' • 

RECALL ITEM l^ . • • • © , . ' 

ai^d wheii he Wlshe^ to discard ah item f^om the file^ he uses a command like 
DISCAlffi ITEM 17 
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Debugging Commands i ■ 

The commands STOP and GO are used primarily for debugging purposes. 
STOP is Inserted as a temporaxy command, to be removed when debugging is 
complete, and may be used elt^her conditionally or unconditionally to halt 
t,he execution =df the program -at the p^int where the STOP command Is en- 
countered: 

I17.3 STOP 

h%.,?>52 STOP IF N > 100 ■ 
While the program is STOPped, the user may inspect or alter the con- 
teuts of core, checking current values of variables ' used by the program; 
replacing, inserting, or deleting steps in the program, etc. To resume 
, execution the user gives the direct command «^ ' 

GC - ' ■/ ' " ' 

During the time the program is STOPped^ '1:he user may not execute another . 
step or part (that is, he cannot give another direct DO command), at least <; 
not if he t/ishes to resume the execution of the STOPped program at a later , 



time. 

(30 may also be used to restart the execution of ■ a program that was , 
halted because of a syntax errors \fter the" program stopq^ and the, errbr 
message is printed^ the user mdy correct the error and^n ^^sume execution 
'from that point by giving a direct GO command - 

Temporary TYPE commands may -also be used forMe'bugglng purposes". These 

N / " 

•are commands like ^ " : 

32.105 TYPE X, Y, K, N / ■ 

that are irlseii^d temporarily r,o that the values of variables will be typed 
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for inspection. When debugging is complete, these commands, and temporary 
STOP, commands, are removed by giving IJELETE commands: 
DELETE STEP %7. 3, STEP 32.105 V 




/ 
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StMmary of AID Commands 

The following summary of AID commands is given in the fonn of examples, 
with comments.' Commands that are ordinarily used di.rectly are shown without 
step numbers and those that are ordinarily use,d ixidirectly are shown with 
step numbers; tp find out which commands mUst be used'direc^ly (or indirectl^l) 
refer to Table 1. ^ 

Most of the examples are shown as unconditional commands; however, any 
command may; be used conditionally (modified by an IF clause) -IS desired. 



DELETE X 
DELETE F 
DELETE A(2,3) 




^eletes the identifier X and its value, 
'deletes the definition of the function^, 
deletes the element A f!rom the array A* 





DELETE STEP 7.I 


deletes Step 7-^1. 




DELETE PART 7 


deletes all steps in Part 7. 

■ ) 

deletefi the definition of Pom 22. 




DELETE FORM 22 




DELETE K, STEP STEP k^k 


deletes tKe thre6 specified items.. 

/ 




DELETE ALL VALUES 


delete? all real variables and their 
values. 




DELEl-E ALL STEPS 
DELETE ALL PARTS ' 


etc. 

/ . , ■ 




DELETE ALL FORMS 


• /" • 




DELETO,ALL 

u 




DEMAND M 


requests a vaJfue for the real variable M. 




DEMAND ■ A(2,3) / 


requeete a ^alue for 'the element 

of the arra:^A. ^ / ^ ' ' ' 


3.7 


DEMAND X(I,J,K) 


requests a value for the element X^ j ^ 
of the^hree-dlmensioiial ^ray X. 


16.1^ 


DEMAND X AS "RADIUS" 


requests a value for X by typing 
RADIUS = ^ 

/ - 
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DISCAED ITEM 20 



disc arete Item 20 from the previously 
designate^ disk file (see USE). . 



^ ' DO STEP 6.2 
DO PART 9 

f 

DO PART 12, 7 TIMES 

JX) PART h FOR X = 2, 7, if..3 

7.2 DO PART S, N TIMES 
^ . 62.15 DO STEP 32.3 FOR A = 5(2)12 



\ 



FILE X AS ITEM ? 

FILE A(7,3) AS ITEM 6 
FILE FORM 3 AS ITEM 12 
FILE 'STEP 6.25 AS ITEM 

FijiE PART 9 :j:tem' 1 

FIL6 all STEPa^S.ITEM 5 
FILE ALL PARTS AS ITEM 21 
FILE ALL FOI^ AS ITEM 7 
FILE ALL VALUES AS ITEM ih 



'FILE ALL AS IT^ 3 • 



executes Step 6.2. - , 

>' . * ' 

executes the steps in Part 9 in nvuneric 
order. 

executes Part 12^ 7 times. 

executes Part 3 times^ once^ with 
X = 2, 'once with X 5= 7^ and once with 

executes Part 6 (as a subroutine)^ N 
times. 

executes Step'32.3 once for each of 
these values of A: 5,1, 9, 12. 



files the identifier X and its value 
as Item 2 of tTie previous_ly'^e&ignated 
disk file {see Wh-x^^^^^ \ 



^ (.Note: The item' number must be an integer from 1 to 25.) 



F0BM7t 

THE liENGTH ;CS 

FOBM 13: . 



— *- INCHES MORE 'THAN THE WIHTH. 



FORM .2: 



defines an output form with allowance 
for one value (see TYPE. . .IN FOEM. . .) • 



defines an output form with allifwance 
for three values^ but no text. 
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THE COST OF ^ ITEMS IS $ . *- ^ 

. ' ^ defines an output fona with allowance 
for. two values. Thet.> first value Vill 
be rounder to the nearest liit^ger^ and 
* . thg second value will be rounded to 

two decimaj places,^ , 

(Notgj^JEhe^ibi^ nm^ ntust^fee^ y positive ^intcfger less than ,10 .) 
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'continues ^"he. execution of a program 
liQltoci by V- STOP cormu^ nd or by a syntax 
error." 



LET . F(X) - -^7 



aefincr the function r(x) « Sx'^ - 7- 



^T 



LET 



V(R,H) 3.lHl59265-^Ht2-K-H aclUnju ;;:he runction V(r,h) jtr h 

( CT.la:r*'^of up •\o 10 variables may 
be^ ui2f inufO • 



F(X) t= (X < 0: Xt2 -t 5; X > ^ 0: X 4.^) 

defines 



' defincG the function 

x'- +.5- if X < 0 

jc + 5 if X > 0 




■f(x) - (X = 1: 1; X + F(X-lO) 
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defines the recureive function 
f 1 if X = 1 

f(x) p: f 

' X + f(x-l) if X > 1 
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RECALL ITEM 7 



recalls Item 7 from the previous desig- 
nated disk file (see USE). 



SET P = 3. 1^+159265 
6.35 SBT^ A(5, 7) = 12.31 

7.3 SET N = N + 1 

X = k.3 _ 

L(7) = 2769 



assigns the value 3,jUl5^26^ to the 
identifier P. 



V. / 



assigns the value 12.31 to the element 
in the array A. ^ 

"5>7 . 

increases the current value of N by 1. 

short form of the SET command, 
equivalent to 

SET X = k.3 

short form'' of the SET command, 
equivalent to 

SET L(7) = 2769 



7.3 PTOP 
26. 6U STOP IF N > M + 1 



■4. 

causes the program to stop execution 
of Step 7*3 (see GO) . 

causes the execution of • the program 
to stop at Step 26. 6U if N > M + 1, 
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31.3 TO STEP 31*1 IF N < 100 
8.25 TO PART* 9 



cauueG a branch to Gtc^l.l if N < 100 

causes an unconditional branch to 
Part 9- 



« TYPE XtY 
7.5 TYPE^ X, F(X) 
la.g TYPE "TAX COMPUTATIONS" 

TYPE FORM 2 
TYPE STEP 3.7-'^ 
TYPE PART 5 



evaluates x^ and types the result. 

^ types the values of X and F(X). 

types an exact copy of the text 
enclosed in quotation marks. 

types the definition of Form 2. 

types the command stored as Step 3-7* 

types all of the commands in Part ^. 

h' 

I 
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TYPE ALL STEPS 
TYPE ALL PAKCS 
TYPE ALL.FOIiMS ^ . 
TYPE ALL VALUES 
TypE AIL 

3-8 TYPp -j^x m jom 2 



evaluates 5x and types the ^l^pult In 
the specified output f 03311 (see FORM). 



USE FILE 100 



* designates^ the disk file to be used 
by aubsequent FILE, RECALL, and DISCARD 
commands. 



(Note: The file number must be a positive integer from 1 to g7500 
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AID Fxinctlons ^ 

' ' .In- addition to the functions that may Ije defined by the user by means 
of LET cc3mmandS; there are a number of useful standard. AID 'functions. There 
are two trigonometric^ functions, SIN(X), and COS(x); X is- in^ radians and must 
have an absolute value 'less than 100. The natural logarithm function LOG(350 " 
yields the logarithm to the base e of where x is ^y posjk.t:|.ye real. number. 
The inverse bf the LOG function is the exponential fanction»^*|;jaP(X) , equivalent 
to e,. . *. . I't, , 

Several functions depend upon features of^i:fte decimal^ repi»eaent&tioh or 

^ ^ . ft 

scientific notation of the argument: " ' - ' ' 

IP(X)^ the "integer payt" function, yields the integer portl6n of . the ^ 

• decimal representation of the number x. For example, IP(.730lf .*56) 

FP(X), the "fraction part" fiuiGtlon^ yields this 'fractional pcyrtion Of . 

the decimal representation of the number x. FP(730it.56),=' .56-. 
DP(X) /the "digit pa'rt" function, yieldc the' digital p6rt of the 

scientific notatioft*'of,3ti For example, •DP(3789-5^) = 3.7895y 

V ' ■ ' ' ' ' ' - . 3 ■ 

since the scientifjliC notation for x;.is 3'.7895^ X lO"'. 

XP(X), the ''exponent part" function, yields the exponent part^ of the 

. scientific notation. ; .FcJr" example, XP(3789*5^) = 3 six^e 3 is. 

used as the exponent of 10 in tKe/repreeentation 3-7^95^ X 10^* 

' Two other >real functions that are occasionally used are S(jN(X) , the 

"sign" functi6n, and SQRT(X), the' "square boo^" function. These are defined ' 

h «' ■ 

• ^ *• 

as follows : - . * 
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SGN(X) = 



' 1 if X is positive 

0 if X is 2ero 
.-1 if X is negative , 



■ . SQRT(X) = \ ■ . . ' ' . . 

There are four func;^ons on lists of real numbers: MAX, mInj SUM, and 

PROD. The forms ofvthese. are similar, and the resulting va'Oaies are, res- 

pectively, the ihaximum of the specified iist^-the minimum, the sum of the 

numbers in iihe list^ and the product. Each of .these foui* functions may be* 

used by, simply listing 'the Inembers of the ^ argument^: * . ^ , 

MINtv§9, 2/3, .63) has a value of .63 ' 

SUM(2, 15, 0, h) has a value 'of 21 ' . ■ , - " 
. /' * ■ • ■ 

The -list of numbers" to be used as an argument may be gllven by speci;fyi'ng a 

foiTOula^and the values 'of the^ dummy variable used in the formulas 
SUM(I = 2, 10, 3:1* 5) is equivalent to 
SU^(2 ^ 5, 10 * 5, 3 * 5).. 
The values of the variable- may be given in a rangq specification: 

SUM(X = 5(1)10: 3/1-7) 
This expression is equivalent to 

10 . . ■ . 



similarly, the exprecsion 

PROD(J = 0(2)6: Jt2) 
is equivalent to 



a?he function FI|ST ia/a f>inction on an- indexed list.df Booleyan expres- 
sions. For a specifip^ lis/ of Boolean expressions^ the FlRST function will 
yield the index of the pivBt true expression. That is, it will find the 
location of the flv^ytrae predicate. The foim of the FIRST function i^ 
shown in this ex^rdple : 

FIB^Wi = 1(1)50: I > 6t2 + 3) 
The value 4x/this expression will be the first value of i in the .set 
[1/ 2,/iy/., 50} -such that i > 6^ + 3 (that value is IfO)/ / 

M:her>iinpler function, on Boolean expressions is the function' TV(X) 
StocX yields either 1 or 0 depending upon whether the Boolean expression X 
i/tni^' or false. For, example the value of W(^ < 1 OR 5 > ^) is 1. 

For -all of the standard AID functions, the values are re&l numbers'; 
^ hence, these funcitions can be used anywhere in algebraic expres^dions Just r 
as in ordinary algebraic notation. They may alsd be com^ined^apd Composed 
in the usual ways. Here are a few examples of algeb;pdic is^qxressions in 
ordinary notation and in AID notation: 
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APPENDIX B 

The Yoxm^ of Programs Written 'by Students (with Division 
^6 Equivalence Classeu u^ing Four Definitions 
of Program Equivalence) 




(U 

§ 



05 

a 

w 
H 

a 

(U 

a 

cd 
> 

?i 
cr 
W 



H a 
2 a 

o H 
•H cd 

a *M 

9 5^ 

p. o? 



Si cu 

O .H 
31^ 



(0 

(/} 
cd 
H 
a 



CU 



a 

pi^ 0 



CM 



Pt5 



(U 



P>a 



CI/ 



in 



a 



m 



On ' VD 



CVJ (V 



/ 



(U. 

CO 

CO 

cd 

H 

a 



cn 
m 
II 



t30 

o 



o 



m 

II • - 

W 



cn 

ON, 



m -if 

voo • • on 
.\0 OJ 



II II 



III! 



— r 

O H 
P4 ^ 



o 
m 
1 

13 




ERIC . 



13Z 



fa 

a 

o 

•H 

+J 
•H' 
+J 

0 

x: 
> 

Vi 
H 

o 

O 

a 

H 
> 



O 

a 



cd 

•H Cd 



> 



O (D 
•rl O 

O 'H 



O 



(U 
O 

a 

H 

CO 
> 



O <D 



J ^ 



OJ 



on 



H 



03 

cd 



(U 
CQ 
CQ 

CO 



OJ 



H 



OJ 



H 



O 
O 



O 



H 

O 



CO 



CVJ. 



+ 

"2^ 



on 
on 

>:? 



EH 



9^ 



II o> . -zj- 

H + 
^ H 

pt4 pt4 

^ W ^ ^ 



H 
I 

H !>- + 
^ H M3 >0 

w pt4 

^ w w y 

EH E^ 



cn 

>< I? 

II o> 



en 

* ON 
?V1 H 

* 



* 

OJ 

* 



* 
OJ 

* 



II 



9 



W CO 



ON 

I 

<3 



EH 



CVJ 



W CO 0\ 
H rH 

^> > 
> 



vo in 



^co W CTs'in W 
W H LTN H 

p::; II II II II ^ 
^ > > 

> p::; W W 



3 cd CO 



CO CO 



o 

4-> 



(d 

CJ 
(U 

CO 
03 

(d 
H 

o 

(U 

o 

cd 
> 



(U 

H O 

(d a 

O H 
•H cd 
•p > 



P«4 



P14 



P14 



P«4 



H 



O 0) 
^ (U 



I, 




a 

H (U 
Cd H > 

g ^ 

O -H 



H 4-> 
cd •H 

O (U 
H 



o 

O 

g 

O 



H (U 

o d 
^ 3 



00 



on 



OJ 



M 



H 



00 
M 



on 



CJ 



CM 



* 

^ 
H • • 



>>> 



^ • 

H ON 

• > 

on 

CM k. 
II •^ 



•4- 



>H on •^ 

X II 

>< 

Ph 



















* 
























on 






on 






• 


• 






UN 










H 










X 




(W 










\'> 










• 


H 


CM vX) 




















Si 




* 




* 












H ^ 






UN 






•4- 


• 


H • 








• 00 








on H 


• 

00 


II 




II •^ 


H 









W 00 



CO 



•^Lf^ 

en 



9 bH S 



CM 

•4- 

& ■ 

* 

-:t vo on 

H • • 

• Op CTs 

on H H 



HI- 

(M 

* 

H 

or^ 

II LfNVO 



on 



^ t>- 00, W LTN CJN W 

W Lf^ H -^VO H 

II II ^ IK, II - 

en Ph B Ph 

P£) W S w Pq S 
CO CO tH CO CO EH 



OJ a 

I o 

"33 



ERIC 



194 



H O 

cd c 

o H 

•H cd 

+j > 

V .H 



(U 

o 
a 

> 

•H 



(U 

o 
a 

H (U 
CO H 

O .H 



O (U 



C/3 
(/3 

Cd 
H 



CO 
0) 
(0 
CO 

cd 
H 
o 



CO 

cu 

CO 
CO 

cd 
H 
o 



7 



H H 




ON 

CM 
U 



O 

Cm 
O 



O 



4- 

* 

on 

II LfNVO 

^ 00 
K LfN H 

II II 
> W K 

W ^ 
3 to CO 



* 



ON^ 

C\J 

on 



PC? LTN 



on 

o^x'k' 

r-l 



LfNVO 



-:t on 



K LA. H 



II 

ft; 



CO CO 



W It II - 

— , > 

> W K 

CO CO 



K LA 



ON K 



o 

H 



II 



^1 

CVJ 



CM 01 

on on 

I CJ 
LTS CJ 



ON 

LA 
* 

on 
I 



CJ 



CJ 



.o" a? 0?^ 
o H on c\j 



o o o 
^ ^ ^ 



o o 



H Q) 

2 I 



I o 



r 



C\J 
I 



ERIC 



1 



• 
• 
• 


Punqtional 
Equivalence 










H 




H 


tione 


;hmi c 
.ence 
















sn Parti 


Algorit 
Equivaa 














H 


ence Class wh( 


Formal 
Equivalence 


H 


H 


CVJ 

w 




H 




H 


Equival 


Formal 
Identity 


M 


ir\ 

H 


VD 
H 




H. 




M 






OJ 


OJ 






H 




H 



O 



o 



a , 

H Q> 

2 B 




ON 

cn 
I 



EH 



3 

9^ 



EH 



6P 



O 

EH 

CO 









CVJ 








CO 


'ST 


11 




11 














w 








EH 




EH 




on 
I 

II 



ON 
1/N 



OJ 
I 



o H m OJ 
o' o' C> CJ 
&^ 6P 



o 



9P &5 



oToT^ 

CO OJ 
cj cT 

&^ 



00 -P 
OJ fl 

I o 



H V 

CO a 
p cr 



a (D 

•H o 

E a 

O .H 



O 

a 

H 0) 

(d H 

O ♦H 



H 

W 



C\J 



OJ 



CVJ 



H 

H 



GO 

O 



H (D 



ON 



O 
rH 

M 



on 



(>0 

Pu 

ON 



H 




OJ 




CVJ 




2i 




m 








OJ 






P^ 




p^r 




p^r 


II 




II 




ri 












P^ 








P^ 






w 


















CO 




CO 





II 



Ph 



o 



CVJ 



OJ 



2i 

. OJ . 



ft, pL, ' pL, pL, 

3 CO Eh CO H W CO EH CO tH 



CO 

OJ C 
I O 



ERIC 



VJ7 



§ 



CO 

c 

I 

to 

CO 



H a 

CO c 

O H 

3 o« 



O 'H 

3s 



a 



a 

d 

H 0) 

CO H 

O 'H 



a 



^2; 



CO) 

o 
u 

o 



o 



. H 



(0 
CO 

cd 
H 

o 



w 

(U 

(/] 
(/] 

CO 

H 

a 

C\J 



cu 

03 
03 
CO 
H 
O 



C/3 

W 
03 

H 
O 



II 



CD 
CO 

I 



^ O C\J C\J H 

^ o H m ^- OJ 

^ cJcT CJ 
o 

M y M QlI 



P4 



P«4 



pa 



OJ 



H 



LA 



on 



on 




(U M 

o a 
Ph ^ 



00 -P 

I o 
a 



I 



ERIC 



1S8 



« 



CO 
G 

V 

a 
.21 



(U 

H O 

cd C 

C (U 

O H 

•H cd 

•H > 
O 

3 cr 

1^ W 



o 
c 

O -H 

w 



0/ 

a 

a" 



H 

Cd 



a, 



00 
M 




a) 
+ 

k 

Of 
C/3 



4- H 



< 



on ^ 



•s 

•.CO 



CO 
H 



< 

CO 



^ on 



CVJ 

if] 
+ 

'hi 

r/j 



on 

o On 

i " MM 



on -4- on •^ctn 

•V < < < 

13 ^ I S S I 

»^ CO cn Eh CO cn EH 



CVJ 



on 

H 



CO C-H CO 



\ 



on 
I c 
On O 



ERIC 



i 

O 



0 



•a 

o 

5 



a; 
o 

> 



Clt 



o 

H 

> 



CM 



CO 



(0 

a; 

(0 
(0 
(0 



(0 

to 

(0 



H 



no 



00 



to 
o 
u 

o 

n 

o 

Pt4 



4- 



\ 



pq 

EH 



m^- •sr-irH •M-fm »^C7N»H 

11 1 11-^ 11 11 n 11 Ml ^ 

< pq ^ < pti' ^ < pq < pq . 

pL] &q 



PlH 00 

II t ^ H 
^ OJ 

H novo -4" 



^of <y Of Of 

Ii3 W W y 



O 

^ r-l 

^ 

M t) H 
OO O 



S C^OO 

'cy cy 



CO 

I 

On 



ERIC 



10 ■ 



) 



, CVJ 



to 

(U 
CO 



- 4t~> 



/ 

CO 



CM 



-it 



M 



(0" 
(U 
CO 
CO 



CO 
CO 

c« 
H 
o 



o 

a. 

o 



o 
p«4 



ST 



CVJ 

f5 H 00 

ON m 

^ II It 



Eh 



»H 00 

II II 



^ w ^ 



^ o ^ 

00 H ^ 00 



*M II II II 

w 

|q M S M S 

CO cn Eh K K 



Of 



p 



H 00 

li 

•soO 

^ OJ o 

a •^00 



Eh 



H 
H o 

M 

t\l H 

^ On 



- cn 



\ ^• H \ 
H cuoo \ o 

t^\QO 00 

ONH h^vo 
>< II 11 II 11 
< u 

^ s s K S 

3 CO C/3 C/D CO 



a 

PQ 



00 C 
I o 



ERJC 



11 



201 



Q) 

O 
•H 



CO 

a 

CO 
CO 

H 
O 

(U 

o 



H O 

CO d 

O H 

•H CO 

•P > 

O .H 

It 



1^4 



H 

P«4 



F«4 



6/ 



H r- 



pi4 fx< (*< P>4 ' 



CO 
CO 
CO 



•p H 

•H CO 

G > 

o *H 



( ' , 



00 
CO 



c3 



O 
P14 



o 

I 



www 



OJ 



CO 
(U 
(0 
(0 
CO 



OJ 



•5 



O (U 

F14 'd 



CVJ 



^ ITN V£) 



CXD ON 



(0 
(U 
CO 
CO 
CO 



in 



CVJ 

Q 



CM 
O 



Q 



C\J 



m 

OJ 

o 



m 

OJ 

o 



on 

OJ 

o 



on 



on 

8J 



OJ 




ERIC 



12 



o 

•H 
•P 
•H 

cd 

CQ 

cd 
.H 
o 

o 
a 



* H 



CO 
CO 

cd 



1^4 



Pt4 



V (D 

-H O 

i ^ 

O 



CO 
CO 

cd 



o 

H (U 

Cd H 

O -H 




CO 

cd 



OJ 



•3 



O (D 
. H 



H M 



CO 
(D 
CO 

w 
cd 
H 
o 

en 



OJ 



on 



O 

on 
It 




o 

1^, 



H 

HI 
X 

Eh ^ 

3 H 
w • 

H 



Q 



H 



o 

LA 



a 



tA 

H 

X . 

H 



VO O 
H H 11 

c\j Q m P ce; 



li II 
o < 



AH 



CO CO CO H 



o 
H 



o 

' s 

0"^ (\J r-H 
54s 4- 

• Vp P5 II 

H * *' 

.^t vo P^i 
H H 

fr; m h o 



a; 



-4; 



cc m 
w w w w 



a 



H OJ cn-d- 



* 



m 

II 



o 

H 

II 

o 



Pi3 



H OJ ITNVO 



a 









<y\ 


H (U 


H 




1 


O g 




^1 3 




{2; 





ERIC 



13 



• 
• 
• 


Functional 
Equivalence 


H 




H 


H ■ 

P«4 


H . 


H 


• 


Class when Partitionec 


Algorithntic 
Equivalence 








- 








L valence 


OJ 






• 

H 
M 




on 


; 


Equivalence 


\j ••n 
















Pcmal 
Identity 


H 






H 


I- 
W 


00 
H 
















/ . 




f 




' ' ! ■ 



I 

o 

O 

o 



O 
C> 



O 



o 



&5 



VO 
'-^ 

Pd on H o 

* * • pi-t 
OJ cvj 

&l W - 



o 

ITS' 

f 



o 
on 



o 

Q 



o 



O 



l/N 



Ccl 0^ 



H OJ OO-it 



II 11 II 

Q P < 

CO CO CO to 

H C\J 0^-^ lA 



< H 



o'er; 
O 

Piq 



I 

PM 

h8 



H * H 

r5 ^ •s It 
• -4" C J 

tti m H 

C\i Q m Q Ct4 

o 

II II ii -^pr^ 
Piq 

H cvj on^. 
• • • • O 

H H H H O 



11 



ct^ on H ,it; 

CVJ W 0»'"> C4 

o 

'II II II ^ . 

*Q O < H 
*Pxq 

EH EH &H'P^ EH 

p^ 

H OJ on-^j- 
• • • • Q 

H H H H ^ 



r4 IM 

ix^ H a< 
*tf • * * 

II ti Ii if 
p^Pk'o < 

CO C/) C/J CO 



o 



o. 



P 



OJ On.^ LTN'V 

• • • • o 

H H H H 



01 ^ 
H (U 



8 



ERIC 



lU 




15 



ERIC 



fi- 



4 



o 

•H 
•H 

t: 

cd 

CJ 
(U 

•§ 

CO 
(0 

cd 
H 
O 

G 
H 
^ > 

a* 



(U 

H o 

' 03 CJ 

O H 
•H 03 
> 

P4 W 



OJ 



o 



(U 

o 

•H 03 

u > 

O -H 

r-j a< 



o 

t 



o 
C3 

<u 

H 
03 
> 

a* 



03 



O <U 



O 
U 

Cm 
O 



O 



o d 



■ W 



CO 



W 
W 
03 
H 

OJ 



(U 

W 
03 
H 
O 



03 

HI 

o 



03 
H 
O 

CO 



OJ 




ON • 

OJ +^ 

m o 



OJ 



OJ 



> m 







V 


\7 




11 




X 










^ 

H 


Pi^ 








H 




o 








• 






X 




• 


H 






« 


H 
















11 


1 








i 














OJ 








« 


« 


W 


a 


s 




H 








LA 
H 

H 




18 



/ 



\ 



o 

4J 

cd 

a 

;^ 

Q 

o 

a 



H O 



i a 

•H Cd 
O 



, q; 
O 

a 

H OJ 
cd H 

O 



cd -H 

O QJ 



P14 



.CM 



OJ 



OJ 



OJ 



pj 



JCM 




LTV / 



\ 

I H 



i 




19 



ERIC 



Lf3 



I 




CVi 



CVJ 








V V 






• 




H 




S i 




TO SI 
PART : 





V 
X 

M 
X 

w 



V 



II 

X 




cd 
H 
o 



OJ 

xo 

(0 
CO 
H 

o 

OJ 



(0 

cd 
H 
o 



OJ 

cd 

1-4 
O 

H 



O 
CVJ 

II 



• • • Q 

H H H « 



H 



a 



H 
as 

a 
o 



§ 

•H 

(d 

a 
0^ 



to 

H 
o 

V 

a 

H 

a* 



(U 

o 

a 

> . 



4-> 
O 



(U 

o 

as 



O 

a 

0) 



o 
P14 



a* 



M 



OJ 



C\J 



CO 



• H 



C\J 



/H 



CO 



CO 



O 

O 

g 
o 




W W'W EH 

V vvS 



■ pq O Q 

< pq O « 

ll II II II 

CQ CQ CO CO 



EH CO 



< PC^ O Q 



c^i CO CO CO ^ 

LfNVO CO ON ON 
• ••••• 

r-\ H H H H H 






CO 


CO 


CO 




V 


V 


V 






0 


Q 


< 




ft 

0 


ft 


II 


II 


II 


II 


CO 


CO 




CO 


SET 


SET 


SET 


SET 



CO 



OJ cnj^ ifwo t^co o\ 
• ••••••• o 

HHHHHHHHP 



CO CO CO 

V* V V 

m o Q 

<: pq O Q 

II II II II 

CO CO 0!) CO 

^ ^ ^ w 

CO CO CO CO 

• • • • • • • 

H H H r-\ H r-\ H 





CO 

i 




0 , 
o g 



I 



id 

ERIC 



21 



■o 

CO 

a 

•§ 

CO 
CO 
05 
H 
O 

(D 
O 

(D 
H 

> 



0) 

H O 

O H 
•H 05 
-P > 

g :i 



o 

Q) 
05 

t> 

O -H 



•Id 

o 

3' 



i 

o 

P«4 



a 

05 
•H 



O 0) 



O 
O 

n 

o 

P«4 



2 



P4 



P«4 



on 



OJ 



VO 

M 




< PQ O P 



ITNVO C^OO ON 
• • • • • Q 

H H H H H O 




^ ^ „ U 

Q P P CO CO CO CO 
H OJ 0O-:t ITNVO C^OO 




H -P 

ITN O 
H O 



ERLC 



22 



o 

•H 
•H 



cd 
a 

CO 
CO 

cd 

(1) 
o 



H O 

Cd a 

o H 

•p > 

0 ^ 

1 g. 



0 (U 

1 a 

■a ^ 

O -H 
U) 

d S< 



H 



o 

CJ 
(D 
H 

Cd 
> 

•H 

?i 
o< 
W 



Cd -H 

O (U 
H 



bo 

s 

O 



o 
p4 



CO 
CO 

cd 



CO 
(U 
CO 
CO 

cd 



on 



CO 
(U 
CO 
CO 

cd 



CO 
(U 
CO 
CO 

cd 



on 

OJ 

II 




CO CO CO Eh 

V vvg 
pq o Q y 

< PQ O P g 
II 11 II II Eh CO 
CO CO CO CO " 



CO 



H <M 
• • 

H H 



S S S 
CO CO CO 

on-sj- ir\KO c^co ctn on 
• ••••••• o 

HHHHHHHHO 




H 4-> 

1 a 

lA O 
H O 



OJ 



H 



c\j 



H 



on 



A A ^ CO 

^ H 
< pq 

II 




/ 



H 



ERIC 



23 



0 , 
f , 



a 
o 

o 



q; 
o 

•H 

-P 



CO 

a 

CO 
CO 

^, 

o 
a 

(D 
H 



O 



4J 
U 

o 

t}0 



a; 
o 
a 

H 



H 



o 



a 



■2 

Ol 



Q 

P*4 



s I 




piq 



ipiq 



ro 



H 



OJ 



/ - 




CM ro-=f • 

. f • Q 

H H H O 



H CM CO g 
H H H W CO Q 



CO . 

I a 

lA o 



214 



2k 



o 



I 

(d 



(U 

o 
U 

cd 

t 

CP 



a 
o H 



■i 



(U 
CO 
CO 

ed 

H 

u 

CVJ 



•H O 

# CJ 

■P H 
O 

bp J3 



CO 
(U 
C/3 
CO 

o 



O 



o 

(d 



H 



Cd 'H 

O (U 
P4 



CO 
H 



f 



ON 



CO 
<D 

CO 
CO 



CO 
(U 
CO 
CO 

cd 



2 

O 



O 
Pi4 





• • • • • o 

H H H H H Q 



UN 



H (U 



4J 
irs o 

1^3 



id 



25 . ^15 



a 
o 

-p 



H 
o 

o 
a 



H O 

S ^ 

0 H 

-P > 

a 'H 

1 & 



H 



a • 
> 



. o 



CVJ 



OJ 



H 



o 
o 



o 



o 


o 


o 


A 


V 


11 


>^ 


>^ 




O 


. o 


o 


A 


V 


II 


X 


X 


X 




t\ U & 

O H O H O H O 

II tl A II V II II 

CO CO tsl CO tsl CO tsi 



CO CO < CO 

• • • • 

H H H H 



H 




O 

A 



O 
A 
X 



o 

V 

o 

V 



o 

It 



o 
II 

X 



& 

O H O H O H O 

II II A -11 V I! II 

CO CO Csl CO Csl CO CnI 



EH EH I 

Pi CO CO • 
« • • • 

r-{ H r-\ r-\ 



EH 
CO 

H 



CO - 
H 



00 
H 



ON 





U 












A 




O 






U 




H 









216 



.26 



o 

•H 
•P 
•H 



cd 

a 

I 

CQ 
CQ 

Cd 
H 

o 

(U 

o 

cd 
> 



■ (U 

O H 

iri cd 

•p > 



PI4 



O (U 

5 w 



o 

cd 
> 



14 



OJ 



CVJ 



cd .H 

O (U 



on 



H 



H 



U) 

s 

o 



o 
p«4 




o 

V 

p 

V 
X 



o 

A 

O 

A 
X 



O H O H O 

11 It V II A 

CO CO CO 

Eh EH p EH 

W W S M 

CO CO < CO 





H 



H • 

OJ 4-> 
1 a 
lA O 
H O 



ERIC 



27 



a 
o 

•H 

•p 



a 

•§ 

03 
CO 

H 

O 

c 

H 

cd 

> 



O H 

1^ 



pi; 



cd 
H 
O 

H 



a (D 

•H a 

i a 

4-> H 

•H CO 

u > 



m 
a 



a 

H (U 
CO H 

gcO 
> 

O .H 
P«4 ?3 



CO 



(0 
in 

o 

00 



O QJ 
H 



W 
(U 
W 
tTi 
CO 
H 

a 



II 



o 

V 

n 



o 



O 

ii 



o 
V 



O 
A 

X 



p 



H 
H 



H ri 



o 

H 

h 

H O H O H 

il V II A II 

cn csi CO tsi CO 

CO < CO < t/> 
H 



H 



o 

11 

OJ 



il CO 



o 

A 

o 

A 

ft 



6 




H 



00 
H 



H H 



P 



H 
H 



O H O fH 
II II A f/ \/ 
CO CO Csl CIS Csl CO SI 



OHO fe] C ^; 



•a: M o 

II II CO O 



&5 ^ 



H H 



VO 
H 



t -CO 
H H S 



H (U 



H • 

I a 

LTN O 

H O 





o 
V 

OJ 



OJ 

* r 



PLH W W 



pq O 
Q EH 

H OJ 



on - 

H OJ 




213 



CO 

O 
•H 

-P 



0) 

o 

•H 



CO 

a 

Q) 

xn 
cn 
cd 
^ H 
o 

<u 

cd 
> 

•H 



P>4 W 



•H cd 
M > 

O -H 



Cd 

n 

o 



<u 
o 

<u 
l-l 
cd 
> 

•H 



p«4 



l-l -p 
cd 

o <u 



o 

o 

o 
P4 



o 
u 



OJ 




VD 



M 



H 







on 




M 


OJ 










OJ 
















1 


< 




PM 








EH P-l 




O 






EH 


CO Eh 


l-l 


OJ 





OJ 



OJ 



00 

M ; 



ON 



M 



O OJ 



OJ 




t 

VD O 



ERIC 



220 



30 



J 



o 



(0 
P-r 

I 

CO 
(0 

cd 



H V 



ERIC 



8 a 
-P H 



o 
a 

H 0 
CO H 

O -H 



■p 

O 0 



. O 



O 
FI4 



o g 



OJ 




OJ 



M 




CM <; 

P-i W W P-i 

g ll ^ ^ g 

rH OJ cn-ij\ LTN H OJ 

r-H r-H r-H r-n'^^J OJ OJ 




•> 


P 




H 






1 




O 
















H 


OJ 


DO 


r-H 


H 



OJ 

OJ OJ S 



I a 

MD O 
H O 



03 
(U 

Cd 
H 
o 

OJ 



03 

w 

03 
OJ 
'■H 
O 



CO 
Q) 
CO 
CO 
OJ 
H 
o 



U3 
U3 
CO 

H 
o 



OJ 



4) • 



a; 

a. 
o 



cd 

a 
a; 

CO 

cd 
H 
o 

c 

cd 
J> 
•H 



H 

cd 

a 
o 

•H 

4J 

a 

a 



4J 
O 



a 
a 

a" 



H 

cd 



OJ 

a 

cd 
> 



P«4 



bO 

2. 

M 
O 



O 



H QJ 

is; 




7 



1 



OJ 



( 



CM 



cj < < pq 




H OJ 
« « • • 

H H OJ OJ 



PLh pq pq W 

P-f 

cn^ H OJ H OJ ^ 

• «««•««• Q— 

OJ OJ cn cn w 



I 



ERIC 



32 





33. 



ERIC 



0) 

a 
a 



o H 

•H 03 
4J > 



§ 



(0 

CO 

cd 
H 
o 

<u 
o 
a 

> 



a Q) 
i ^ 

4J H 




if Si 



O 



o 

U 

H (U 
CO H 

O -H 

^ Si 
CP 



>5 

H -P 
CO "rl 

O Q) 
M 



M 




a o w 

< w o 
p p p 

w < < 
AAA 

< w o 



CO 



Eh 











A 










TO 
TY 








H 




CO 






• 


• 












< W < 

V V V 
o < ffl 

V V V 
woo 




O O O *s ^ 

u> vo ^- o < o < 



^^^^ <r pT w cT ^ 

C^CXD Ch H rH H H P-t 

OOOOOOOOOQ 
H H H OJ ro^ Lr\VO C— P 



ERIC 



3k 




a 
o 



cd 
d 

I 

CO 

(d 

V 

a 





<U 


H 


O 


2 








O 


H 


•H 




•P 












• ^ 






W 


a 






a 




a 






■p 








•El 




Q 





CVJ 



o 

. n 

H O 
Q 'H 



(d 

I 



^ H 



PL4 

O 



O 



AO 
H 



o pq o < ffl < 
A A A A a' A 
pq o < o < pq 

pq o < O < pq 
A A A A A A 
<; cc pq pq o o 

^ & ^ 

^§^SS^<pqp<a:opqpc^<opq 

i-it LfNVD C^OO ONHOJCOHOJOOHOJOOH 
• •••••• • 

HHHHHHC\JC\J0JmmcO-sl--sl--sl-LfN 



O < 



oj m 

LTN LfN 



H OJ 
• • 



pq o 



pq < •v 

H 

9^ 6^ . 



(l4 

CVJ on 



PL4 » 



I a 

VD O 



ERIC 



35 



§ 

4-> 
•H 

cd 
a 

U 
(U 

•H 

g. 



CO G 

a (D 

O H 

•H OJ 

■P > 



O (U 

■Pis 

O' 



i 

o 



o 
a 

0) 

> 



•4 




LTN 



00 



CO 
(U 
CO 

a 

cvl' 



(U 

en 
cd 
H 

a 

ITS 



tn 
a 



O (U 



(0 
(U 

o 

00 



o 

o 
o 



vn!> vn!> 

pq O < O < PP 



CO CO < PP o 

V V n n ^ ,^ (x. ^ 

PPOCOCOCONHHHHHg 
C) PP O < W < Eh 






< 


o 


< 


pp 








o 


o 



CO CO CO ^ < *^ > 

w ^ y gtl 



H CVJ CO-;!- ITSKO t^OO QN H CVJ H CVJ H C\J 
H H H H H H 



• •• •••••••Q 

HHHHCVJCVJOnm-:l--^W 




PQ 

pq pCj <j; < < 

^ PP O 



PP 

0 



Fm 

H 

O 



HHPPCJ<CJ<i; PP 

ci" PP*" < PP < 

W pq &q 9^ W 



,00 



H CVJ H OJ H 
• • • • • 

on cnj± 




(U M 
H Q) 

O 



I a 

O 

H O 



ERJC 



36 



I 



0) 

a 
o 

+> 



(d a 

o 1^ 

•H (d 

+* > 

o ^ 



ERJC 



O (1> 

B a 

O •H 



. 0) 

o 

H 0) 
O fi 

e4' 



o 



+> 

+> 

a 




o 



0 , 



C\J 



H 



CVJ 



on 



CVJ 



* 



* 

* 

on 

H>?^ CVJ 

. r g 



CO 



CVJ 



X 

H 

+ CVJ 
X H 
It Ah 

CO 

CO 

• • o 

H H pL< 



CVJ 



t S 




on 



37 



o 



H a 

fa a 



t: 

cd 

Phi 

a 

to 
(d- 

H 
O 

(D 

a 

& 



a 
a 

Of 1^ 

e § 

O -H 
W 



Cd 'H 

' o (U 



12; 



o 
o 



o 



H <U 

•9 



/ 



1^4 




X 
* 

X 
* 
on 



vo 
V 

X + M 

OJ OJ X H 



in 



in 



6P 



1 1 1 

^ gfl ^ 



• • • 

H H H 




H c\j ro 



I Q 

on o 
a 



ERIC 



38 



0) 

o 
o 



a 

CO 

CO 



H a 

a -H 

g 0 

3 cr 



- a <u 
i ^ 



CVJ 



0) 

a 



- <U 

o 
a 

H 0) 

cd H 



vo 



CO 



■5 



o <u 



vo 
H 



O 
U 
PL, 

O 



o 




CVJ 

* 



H H 

/ 11 



X CO I 



vo 

V 

H 



H 



CVJ CO 
• • 

H H 



H 



X 
It 

X 
CO 

invo 
H H 



m o 
o 



ERIC 

hriimiirnrrruma 



-39 



.9 



o 

•H 
•P 
•H 

t: 

cd 

(A 
cd 



0) 

o 



Cd a 

O H 

^ cd 

3 cr 



1^ 



o ■ 

I, 



cr 



O 

H (U 
Cd H 



m 



cd 'H 
O Q) 



CVI 



H 



CVI 



OJ 



t3D 
O 



O 



P 

5 CQ 
H CM 



+ 
O 

il 

o 



ti 

o 

cvi 



+ 
o 

It 

o 



V 
o 

CO 
H 



W CO 



CQ Eh 



O K 



(VJ o 

o 

' H ^sq O H 

II It 

O O EH 

ClI CQ 



Eh W &H " i/Ej 

r4 H H H rVS i S 



/ 




Q) M 



?1 

I 



ERIC 



1*1 



O 
•H 
•P 
•H 

CO 

a 

I 

o 

o 
a 



H cr 

CO a 

O H 

^ CO 

+J > 



O QJ 
O -H 



QJ 
O 

a 

H QJ 

CO H 
O -H 

cr 
M 



1^4 



< 



03 

'CO 
H 
O 



03 

03 
03 

O 



03 
Q) 
03 
03 

cd 

H 
O 

VO 



Pt4 



Pt4 



Pt4 



03 
(0 
CO 
H 
O 



03 
CO 
H 
O 



H 



03 
03 
CO 
H 

o 



CO 'H 

O OJ 
M 



H 



H 



03 
(U 
W 
03 

O 

VO 



H 



CVJ 



cn 



03 
OJ 
03 
W 
CO 
H 

o 



II 



/ 



bO 

s 

O 



o 



11 

i ^ 

a CO 

H OVJ 
H H 



CVJ 
H 



H H 
It CM 



CO 



CO W 



CM 



H H 



VO e 

H S 




QJ ^1 

el 

CM 125 




H 



cvr 



CVJ 



on 

CVJ 

01 




ERIC 



k2 



'^32 



1 . 



I 

•H 
+^ 
•H 

+^ 

I: 

cd 

H 

O 

o 
cd 

g. 



o 



o 



o 

•H 



.2 



o 



s 

Ph. 

CM 
O 

O 



0) ^ 

H (U 

o i 

^ 3 




OJ 




o 
H 
I 



Eh Eh 

a ^ W 



OJ 



• • • • • • O I 



.Ah 

a 



0?^^ o ♦ 

>^ ^ s . . 

I M H 

pL| • 

H txJ H H 

II II •^P^ 

a Pq H . _ EH 

S Eh Eh Oh 

0 Piq pq p p 

P CQ CQ Eh Eh 



S 

o 
is; 
H 

I 



1 

OJ 



ERIC 



^3 



. CO 

o 

O 

> 
•H 



H O 

CO 

3-5 



H.irH CO 

u > 

O .ri 

^ Si 



H 
cd 



o 

CO 

g. 



P4 



P«4 



•OJ 



1^ 



O (U 

P4 TIJ 



O 
O 



O 

P^ 



H (U 
O g 

Ph is; 



OJ 



Piq 



OJ 



Piq 



H 



- H> 



, H 

PL, i . 

H H H H 

HI! 

P^ &q 

ptH H EH 

^ CO 
Eh Eh pLi 

Piq P=a P p 

CO CO EH Eh 




OJ cnJt \s\ 
* • • • Q> 

H H H H Q 



Ph 

H OJ cn^ U\ 

■ ■ • ■ ■ Q 

H H H H H P 



I ^ 

\o O 

OJ O 



H 



OJ 



H H 
II li 
H 
EH Eh 

p=q 



H 
H 



^ ^ P H 

&^ &q CO ^ 

^|pg 

P CO CO Eh Eh Eh <i 

H OJ on ^ 
• • " • • • • o 

H H H H H H P; 




ERIC 




he 

236 




u 

s 

Ph 
O 



O 



0) M 

S J 






nr; 










t 

< 




< 






V 




V 














< -J 








■>* 










K 












H — 










H V 


OJ 


V 


CO V 






« 






••• 


o 


IB- 


O 


• 



H 

O 
EH 

EH 
CO 



>^ 

H 

O 
EH 

CO 



s -a 

O CVJ U CO 



CO 



CO 




ON • 

ON O 



^1 



ERIC 



QJ 

O 
•H 

•P 



^' 

03 

cd 
O 

a 

> 
CP 



H O 
cd d 

O ^ 



O (U 

€ ^ 

^ QJ 
•P H 
•H' Cd 

H cr 



o 

H QJ 
cd 

3 cd 
o -H 
cr 



Cd 'H 

O QJ 



O 
PL, 

o 



o 



H 



H 



QJ 
03 



1^ 



03 
QJ 
03 
03 
Cd 




V 



H 



on 



H CVJ 
H/H 



CO S CO S CO ^ ^ 
lAVDr^WHKMKHEHH 



Is 




ON . 

I d 

ON O 

OJ o 



ERIC 



lt8 



/ 



to 

§ 

•H 
•H 

t: 

a 



(U 

o 
a 



> 

•H 

Hi 



. 0) 
H O 

CO a 

O H 
•H « 
•P > 
O -H 
0 



0) 

o 
a 

H 0) 
cd H 



CO 'H 

@^ 

O (U 

H 



O 
O 

n 

o 

P>4 



H 0) 

S I 



/ 



CO 



en 



OJ 



W / 



/ 



OJ 



OJ 



vo 



o 

1^ 

O OJ 
CO ^ 

H OJ 
• « 

H H 



CO h-^ 

CO H 
• « 

H OJ 



+ 

. CO 
' II 
CO 

CO 



OJ 8 



H O 

II 11 

H CO 

EH EH 

H OJ 







V 


H 




H 




H 


IF 


+ 




m 


CO ^ 


H 


• 

H 



CO 

CO Eh 



03 



LTNVD 



s 




• ••••• 

ro ro ro ro ro ro 



b-co 
• • 

en cn 



ON 

ro 8 



I 

OJ 




* \ 



ERIC 



H9 




o 

ERIC 



50 



H O 

O H 

•H CO 

+> > 

8 ^ 

P cr 



cd 
Ph 

a 

■§ 

W 

Cd 
H 
o 

o 

CO 
> 

cr 



O 

bp ^ 



O 

a 

CO 
> 

a* 



CO 

O (D 
M 



S 

O 



O 
P«4 



0) M 



00 
M 



03 
(U 
03 
03 
CO 



00 



03 
(U 
W 
CO 
CO 



/ 



03 
(U 
0} 
0} 
CO 



00 



ON 




on 
a? 

H 

II 

5 O 



o 

II 

CM CO 



CM 



s 



H 

+ + 

CO M 

11 It 

M __ CO M 

EH EH 

^^^^^^ 

H CM cn^ mvo 



CO 



M 

+ 

CO 



V 

M 



CO 



I a 

CM O 
.00 o 



M 




00 

I 

CM 

on 



ERLC 



51 



% 

Q 

4J 
•H 

t: 

cd 
a 

to 

CO 

cd 
H 

(D 

a 
a 

cd 
> 

en 



H a 

cd a 

C (D 

o H 

-p > 

3 cr 



a Q) 
^ a 

-P H 

•El 5 

O 

bp :3 



a 

nt 

O 

cr 



C\J 



CO 



H 

cd 

Q Q) 



o 
u 

O 

I- 

o 



H 



C\J 



/ 



o § 





o 




H 






















h' 


tl 




H 


H 




O 






O 






H H 


• \ 


CM 


O fO 03 



H 



H 




H C\J 0O-=r IfN H CM 



H H H H H CM CM 



00 +J 

i a 

CM O 

oo a 



ERIC 



52 



242 



o 



cd 

a 
> 

CO 

Cd 
H 
o 

o 
cd 

;> 

CP 



cd 

§ 

O 

a 



<u 
o 
a 

:^ 

CP 



O'-H 



1 

P4 



o 
a 

cd 

t> 

CP 



P4 



CM 



tA ' 



in 



cd -H 

O <U 



H 



H 



H 



O 



o 



Q Q 




r-i H r-\ r-{ r-{ 



H CVJ (30 ^ 
• f • Q 

CVi CM CVJ O 



ERLC 



CO -P 

i a 

CM o 
rn o 



§ 

•H 

•H 

a 

(D 

Si 

tn 
o 

(U 

o 

a 

a) 
> 

•H 

cr 



H O 

CO G 

(U 

si 

O .H 



•H 



4-> 

•a 

O 



(U 
O 

H (U 
Cd H 

B 0 



o 



•H 



CVJ 



Pt4 



00 



CO 
(U 
CO 
CO 
CO 

H 
o 

on 



CO 
(U 
CO 
CO 

CO 
H 

o 



CO 
(U 
CO 
CO 
CO 
H 

o 

00 



Pt4 



CVJ 



1^4 



CM 



CVJ 



CO 'H 

. O (U 
H 




CO 
(U 
CO 
CO 
CO 
H 

o 

ON 



H 



CVJ 
H 



H 



CVJ 



2 

O 

• o 



(D 
H 

g 
cm 




o 


o 






H 


H 










o 








on 








V 










tl 


11 








H 




H 


H 


















o 


O 










11 


CVJ 




• 


CVJ 


m 




H 




^ H 






Pli W 

S ^ , H a CO 

HCVJOOHOJHH^^ 

*Qw 

HHHCVJCVJonLrvQQ 



CO CO 



00 4J 
CVJ o 



ON 
H 

CVJ 

on 



ERJC 



H 
cd 
a 



§5 

a 



O H 



•H 

-P 

o 



0) 

a 
o 

•H 



/a 
I 

Ui 

Cd 
H 
o 

Q) 

o 
a 

(U 
H 



5 



•H O 

g a 

o ^ 

H CP 



P4 



on 



CM 



on 



(JQ 
0) 
03 
03 
Cd 
H 

o 
on 



03 

cn 



H 
o 



0) 

u 
a 

H (U 

cd H 

O -H 
pi 



on 



03 
(U 
03 
03 

Cd 
H 
o 



a* 



H 
cd 



O Q) 
P«4 



to 

s 

o 



o 



on 



,3 • 

o 1^ o 

on ' H on 

V S r h' V 

II 

H • H 

^ ^ ^ 

Eh h II II II II 11 II 11 II , " S ft 

^pc; ^^^^^^^^^o on^ . 

HO H CM on-=r lAVX) t-QO onh h 

On H CM on-=r lAVD t-'QO ON Q . 

H OOOOOOOOOHCMH^ 

,0 • • ••• •Q 

HO HHHHHHHHHHHonP 




03 
(D 
CO 

O 



s I 



C3> . 

H +J 

I a 

CM O 

on o 



ERIC 



55 




APPENDIX & 
The Theory of Diversity and Coherence 



*This appendix is an excerpt from J. E. Friend Sc M. T. Kane, "Diversity 
of Coherence," (in preparation). 
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Definitions 1 to 5 and Theorems 1 to 7 are standard mathematicaj. 
notions and are" included for completeness and clarity. Defi,nitions 6 
and. T^arfa Theorems 8 to 1? are also found in the mathematical literature 
but are less well known. In this exposition, all sets are taken to ^be 
countable and 'all random variables' discrete. In many instances defii^- 
tipns and proofs are given only for finite sets, although it is clear 
th^t most of the theorems hold' for at least countable )sets. 

It is assumeii that the reader is familiar with the elements of 
statistics, and that he has eome acquaintance with set notation and the 
basic properties of inclusion, union, intersection, and set difference. 
Some of the standard theorems that are assumed in the following are: 

A U B = B U A 

AnB=BnA 

A U (B U C) = (A,U B) U C 

A c A. U B - 
• A n B A U B 

If A e B and B c C then A c C * 

If A e B then AnB-AandAUB=B. ^ 

For the cardinality of a set X, the notation N.(X) is used, and these 
' theorems (among otTiers) are assumed: . 

If A e B then N(A) < N(b) ' . 

If A n B = 4" then N(A) + N(b) = N(A U B) 
N(A) + N(b) = InA U B) - N(A n B) 
yj(cf>) = 0 . 



The casual, reader may wish: to omit proofs. ' A reader who is familiar" 

- . * ■ I 

with simple properties of equivalence relations may start reading with, 

•'.'..../■ 

the comment preceding Definition -6. The comments are not essential and 
may be. omitted by any reader. 

^^Definition 1 . A classification of a set S Is a sequence S^^S^, • • • 
with these properties ; ■ - 
(i) for each i \ ■ 

(ii) U U - y - S / j 

(iii) S. n S. - 4> if ■ i / j . 

Comment. A classification is a means of subdividing. §l set S into • 



1 



subsets so that each member of S is in one and only onej of the subsets, 

(This definition of classificsrtion is similar, but not identical, to the 

more commonly used partition. In a partition, empty sets are not allowed,; 

^— 

whereas one or more of the sets in a classification may be empty.) 

Theorem 1 . Suppose S^^S^^ • . • , \ i£ £ classification of S' and that 
S' c S.' Then fl S'',S S*, .. S n S' Is a c lassification of S*. 



Proof : ' 

JVQof of (i) : 3^ n S.' c S' since A (1 B c B for any sets A and B. 



Proof of (ii): (s, n S') U (S. n S-') U'-.. (s. n S') - 

(S.^'U U ... U S/) n - S n S»'." Now S n S» - S' since S» c S, so 

^ 1 2 k • - 

(ii) is established. 

Proof of *(iii): We write S. n as Si for each i < k so that ' 

1 1 — 

S, n S»,S^ n S», S, n S* = S» S» S*. Thus we have S? n S' = 

1 ^2 ' • k 1^ 2^ • k ^ 1 J 

(s^ n s») n (Sj n s^ - (s^ n s^) n s». if i j we tiaw s^ n s^ = <t> so 

(S. n S.) n S' = Hence, s? n S! = <t>. 



\ 
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Theorem 2 . Suppose S^,S^, , is a classiffcation of a set S. 
Then S^jS^', . . Z. S^^_^ is a classification of S - S^^. 

Proof : For each i < k we have's^ n r.^?i^ " ^^i ^ ' 

s. n s = s..^ For i = k, s. n (s-s^) = (s^ n s) - (s^ n s^) > - = <t>. • 
Hence, 3^,82, s^_^,4> = n (S-S^), n (S-S^), n (s-s^), 

which is a classification of S-S^ by Theorem 1. It is- easily verified 
from the definition of a classification that if S^jS^, S^,4> is a 

classification of S,,so is 3^,82,..., S^, and the proof is complete. 

Theorem 3 - ^^^j;^^ ^^.3^ S^^ is a classification of a set S. 



Then 

■k 



5l N(S ) ^ N(S) , 
i=l 



where 1I(S) is used to denote the n^imber of elements in the set S. 

Proof : The proof is by induction on c. If c =-1, then the classi- ; 
fication contains only one set B^. By (il) or Definition 1 =^ S. Hence, 

N(S^) - N[o^) - N(S) . 

, ■ ■ ■ 

•"^Now, assume the theorem is true for n < k. N(S) ■= N(S^ U U ... U S^^) 
by (ii) and 

N(S) - N(S^ U U ... U S^_^) + K(S^) - 

N((s^ u u . .. u s^_^) n S^) , 



since 



N(A U B) ■= N(A) + N(B) - N(A n B) 



for any. sets A and B. The set (S^ U U ... U S^_^) D is empty, 
however, since any element in S^^ cannot be in for i ^ k by (iii). 
Hence, we have ' . 

K(S) = N(S^ U Sg U ... U S^_^) + N(S^) 

The sequence S , is a classification of the set S - S from 

Theorem 2, so * 

k-1 

N( 



i(s-s^) ^Yl N(S^) 



i=l 



and 



k-1 



N(S) = N(S-S^) + N(S^) = Yl + N(S^) - N(S^) • 

Definition 2 . An equivalence relation R on a set S is a set of 
ordered pairs {(x,y)) with the following properties : 
(i) If (x,y) € R then x e S and y € S. 
(ii) If X € S then (x,x) € R. 
(iii) If (x,y) G R then (y,x) € R. ■ 
(i-v) If (x,y) € R and (y,z) € R then (x,z) € R. 
Comment , An equivalence relation contains the pairs (x,y) for which 
X is equivalent to y. Thus (i) states that we are concerned only with 
pairs from the given set S; (ii), which is known as the reflexive 
property, states that every element ijfTs is equivalent to itself; (iii) 
says that if x is equivalent to y, theXy is equivalent to x (this is 
sometimes phrased 'an equivalence relation is symmetric*, {iv) states 
that if X is equivalent to y and y is equivalent to then x is equivalent 
to z (t^is is the transitive property, which is shared by a large number 
of relations that are not equivalence relations (<, >, c, etc.))* 

^ Z50 



^ Theorem If . Suppose S.^,S^, S^^ is a classification of S, Then 

R ((x^y) : x>.y g S^>- for some i suqh that 1 < i < equivalence 

relation. 


Proof: - \ 

Proof of (i): If (x,y) g R then x g and y g for some i. Hence, 
V <E S, U U ... U and y g S U S. U . . . U S, . ?y Definition 1, 

- 1 2 K x d K * 

S_ U S U ... U S, = S. Hence, x g S and y g S.' 

1 ^ 

ProQf of (ii): If x g S then x g for some i by (ii) of Definition 
1. Hence, (x,x) g R. 

Proof of (iii): If (x,y).G R then x g and y g for some i. 
Hence, y g and x g so (y,x) g R. ' 

Proof of (iv): If (x,y) g R and (y,2) € R then x € and y g 
for some i and y 6 S. and z g S. for some Hence, y g S D S so 
S. n S. is not empty. Ey (iii) of Definition 1, i = Hen<:e, x € 
and z G S. = S. so (x.z) 6 

Comment . From Theorem 4. we know that eveiy classification ""defines' 
a uriiqut-j equivalence relation tijal ij fomed byv-taking all possLble pairs 
from S^, including tir,ube oi the fomi (x,x), together with all possible 
pairs "frcm.S^, -stc. The classes S-j^^^^, ..• are known as equivalence 
classes . The converse Theorem 4 is also true, although it is not 
pi'oved here; every equivalence relation defines a classification. (The 
classification define^ by an equivalence is not unique, because the order 
of classes may vary and because one or mol^ empty •sets are allowed. How- 
ever, it is essentially unique, i.e., unique up to empty sets and order.) 
* In the following it may be assujned that every equivalence relation is 
defined by means of- some classification. 



\ 



. \ Theorem 3 - Suppojse S^^S^, .\. > i£ a classification of and 

\ R-r: ((x,y): x,y e S^. for some i] Then 

. .\ . . . k- . • • • . 

■ V . » ' N(R) = .^n2(S.) 

• . " ' i=l ^ 



i9i 



. \Proof : (deferred) X. 

/ 

^ ■ \ . * ■ 

Comment , From the above theorem, ve' know that the number of Vl,ements ♦ 

~ : - . ■ " • ^ "'X ' 

, , (pairs) yin- an equivalendfe relation is the siim of the squares of the num-V^,^^ 
\ bers of elements in the equivalence^ classes. The proof of this theorem 

\ ' . . 

\ depends upon' a more general theorem concerning the number of elements in 
the cross product of two sets: ^"^.v^ 

\ Definition 3 , If S aM S' are two s'ets , the otoss produ<;t of S >X S\ 
is {\(x,y) : X e S and y e S'l . , 

Lemma. n(s XS") = N(S) • N(S^)'^ ^ , \ 

Proof: The. proof is by induction On N(S). \ 

* \ 
If N(S) = i then S = (s] for some s and S X S' = (.(s,y): y g S']'. ✓ \. 

Let f(x) = (s,x) for x e S'. Then ^(x) is a one-one correspondence^ 

between and S X so N{sO = N(S X S')- Since N(S) = 1 ve have 

^ N(S X S') = N(5) ; N(S*). . 

Assume N(S X S») == N(S) • N(S') if N(s) < n, and let 

S" = rs.,s^, .... s 1 be a bet with n members. Then S" X = ((x,y): x e S" 
^ 1 2 n 

and y G S^l = {(x,y): x e 7 (sj and ye S'] U.{(s^,y): y e S*) , which 

is a union of dis 1olnt sets since s ff S" - (s^]. Hence, 

^ 1 n n o 
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er|c r '^^'^ 



/ 



N(S" X S'r= N(((x,y): X £ S" - (s^l^^d y e S'}) 
■ . + N(((s^,y): y e S')) 

■ = N((S" - {sjy X S') + N((s^) X S'V 
= N(S" - (sj) • N(S') + N((s^l) . N(S') 
= {N(S" - (sj) + N((s^))] • N(S'^) / 
. = N(S") • N(S») . / 



Proof of Theorem ^ . We prove that N(i) = N^^^J^EiS^"'^"^^ 




Let X = ((x,y): x,y e S^l be the cross productof S^-^fffh it 

adily seen tha 

R c X U X U . . . U S^. Suppose (x,y) e X US 

S.^ U ... U X,S^. Then (x;y) e S^ X S^ for some i. Hence, 
y e 's^ so (x,y) e R by the definition of R, and we have shown that 
S X S U S„ X U ... U a, X S. c R. Since the incOiision holds boih 

1 1 2 2 K K / 

ways, the two t-.cti: are equal. ' . 

Now X n Sj X if 1 / since. (x,y) e X S. an 

(x,y) e S X S imply that x and y are element:: of both S^^ and SL which 
is impossible by (iii) of Definition 1. ■ ^ 

Hence, R is a union of disjoint sets, so N(R) - 
From the lemma, N(S^ X S^) « N(S^) • N(S^) so 

k 

■ N(R) = JZ N^(SJ . 

i=l 



8 

Z33 



Theorem 6. If R c R' then / 

— - i^l ^ ^ ^ 



i^i 



Proof: Since R c R*, 



N(R) < N(R0 . 



Hence , 




ZI N^(SJ < ZI Theorem 5- 

1=1 ^. i=l 



D^^l^fi^"^ , The Identity relation I on a ^et S is ( {x,xj^^ S) 
'^b^m^^ equivalence relation , 

set S .has n elements the re;Uttfon I contains n halVB 
^ and the classification deffned^-by^ Is n sets' of the . 
^Cfxm {x)^ Under the Identity relation no element is equivalent to any 
element other than Itself. The opposite of this is the universal relation 
U. defined below, under which any two elements in S" are equivalent. 

Definition ^ . The universal relation U on a set S is [ix,y) : x,y € S] 
Theorem 8 . U jLs an equivalence relation. 

2 

Comment . If a set S has n members then U contains n pairs. The 
classification defined by U consists of only one subset of S, namely,. 
S itself. > ^ 

Theorem 9 . For any equivalence relation R, 

I c R e U . 

Proof : If (x,y) e I then y = x € S. Hence, (x,y) € R. If 
(x,y) € R then x € S and y € S so (x,y) € U. 



Comment . In the above theorem, as in m^y of the following, it ie 
assumed that all equivalence relations a:^e defined on the same set S. 

Theorem 10 . If R and R' are equivalence relations , then R n i£ 
an equivalence relation . 

Pi:oof (referring to Defini-tion 2): - • - 

Proof ^f (i): If (x,y)' € R n R' then (x,y) k R so 'x and'y are in S. 
Proof- of (ii): If x e e S then (x,x)- e R and (x,x) € R'. Hence, 

(x,x) e R n R«. ' ' 

Proof of (iii): If (x,y) e R n R' then (x,y) e R and (x,y) e R'. 
Hence, (y,x) e R and (y.x) c R' so (y,x) e R fl R'. 

Proof of (iv): Assume (x,y) e R fl' R' and (y;z) e R n R'. Then* 
(x,y) € R and (y,z) e R so (x,z) e R. Similarly, (x,y) e R'. Hence, 

(x,z) € R.n R'« ... 

Comment . Although R n R' is an equivalence relation, it is not _ 
always the case that R U R' is an equivalence relation. However, R U R' 
does satisfy (i), (ii), and (iii) of an equivalence relation. To satisfy 
(iv) we add enough pain; r,o thai R U R' cl.;aed under transitivity. 
This result we call H ffi R'. En the mathematical literature the set 
R ® R' iii called the traruitivt^ closure of-R and R' . The reader who is r 
not intcrc i,ted Ln the U ohnicai details of this development may skip to 
Theorem 19 • 

t h 

yefinition 6 . Given two equivalence relations R and R', the n 
extension of ,R and R', denoted Ext(n), is defined as follows, 
(i) Ext(l) = R U R'. 
(ii) Ext(n) . Ext(n^l) U ((x,y): 3z such that (x,z) e Ext(-n-l) and 

(z,y) e Ext(n-l)l. ^ 
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^ Theorem, 11 . For any equivalence -jcglatlons R and R' , and .for any n, 

Ext(n) ^./^ ' ' ^ . ^ - • 

' Proof: The proof is by induction on n. For n = 1, Eixt(n) ^5 R U'R"# 
Assume x € R U R*. Then x € R or x € R». Hence, by Theorem 9, x € U/ 
and. we conclude that Ext(l) c U. " . 

Assume Ext(n-l) c U. Let (x,y) bd a member of Ext(n), Either 
(x,y) € Ext(n-l) or (x,y) e ((x,y): 3z such that (x,z) e Ext(n-l) and 

(z,37) e Ext(n-l)}. If (x,y) e Ext(n-l) then (x,y) e'U by assvimpfion. 

/ . - • • 

Otherwise, (x,z) e Ebct(n-l) and (z,y) e Ext(n-l) for some z. BJp assump- - 

tion then, {x,z) e U and (zvy) € U. Hence, x and y are elements of S, 

and so (x,y) € U. Q.E.D. • 

Theorem 12. N(Ext'(n)) < nCU) for any n. 

Proof; This follows directly from Tl^ieorem 11 and the fact that, for 
any sets A and B, if A^c B. 'then N(A) < N(B). " ' 
Theorem 13 . N(Ext'(n)) < N(Ext(n+l)) for anjr n.^ 
Proof ; From ^Definition 6, we have Ext(n-I) c Ext(n) for n > 1. 
Comment , Theorem 13 shows that each extension of R, and R* is at 
least as large as .the preceding 'extension. Theorem l^f states i^hat at. 
some point the extensions do nbt become larger. 

Theorem Ik . For some n, N(Ext(n)) = N(Ext(n+l)). ^ ' 
Proof ; The sequence ' . . , 

. ^N(Ext(l)), N(Ext(2)), N(Ext(3)), ... 

•J ■ . 

is a raonotpnically increasing sequence of integers with an uppej bound 
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Comment . ITheo^ni Ik states that at some point some .extension of R 
and R' is identical in nvunber. to the next extension. A stronger result, , 
below, is that these two eictensions are .identical in extent as well as . 

The(;;^m 1^ . For some n, Ext(n) = Ext(n+l). 

• Proof: From Theorem Ik^ there is an n such that N(Ext(n)) = 
N(Ext('n+l)). We know Ext(n) c Ext(n+l) from the definition of Ex^, so 
there is a sublet X of Ext(n+1) such that Ext(n+1) - X = Ext(n). Then 
N(Ext(n+l))-- N(!X) - N(Ext(n)). But 'NCx) .must be 0 since N(Ext(n)) • 
N(Ext(n+l)) so ,X =- , 

Theorem l6 . For any n, R U R' cr Ext(n);. 
. P roof r The proof ii^ 'by induction on n. Pur'n ^^1, •Ext(n) - R U R', 
so R, U R* c Ext(n). ' ' \ ^ 

' ' Asoume R U R''c^tU-l). By (ii) of Definition 6/ Ext(n-l) c Ext(n) 
.Hence, by tranoitivlty of.cz we h^^ve R U R' c Ext(n)i 

Comment . Tht- .fulluwing 'thpo^rera « hows; 'that as Eoon ao the extensions 
•of R ar.i M ict'.y uicn-asir^; i.n as..-) U:--; R-j'a.U' lu an equlv- 

" ale'nce-, r-ilatioii. Tr L'. mt.an:. have addfd er^/a^. pain; I'o R U H' to foiro 
an equivHlfnce- rt.'lati..n. . ■ . _ » • 

Tht;..r.^m 17 . LrKxt ^-i). Fy^Jn-i-ll ^llliii ^'^^^ i^' ^ un Vqul valence 
relatioh, * ^ * • ^ ^ ' 

proQf (referring to Definition 2)^. , , ' ' . 
proof of^ (i): If •(x,y) ^ Ext(n) -then (x/y) e U by ThcMjrera Si ^ . . 
Hence, x € S and y € S.^ ► ^ • ' . , 

' .proof of (i-i); 'if.x e h then (x,x)'e R uince, R ii; an equiyalence 
"relation. Hence, (x",x) e R U R';.and by Theorem 13,.-(x,x) 6'EXt(n). 
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Proof of (iii): The proof is by induction. , ^ 

If (x,y) € Ext(l) then (x,y) e R U R' so (x,y)' e R or (x,y) e R'. . 
If (x,y) € R then (y,x) e R si^nce R i?, an equivalence relation. Hence, * 
(y,V) 6 R U R' so (y,x) e ExtCl). ' ff (x,y) € R' a similar argument holds. 
Now assume that (-iii) holds for and let (x,y) be a member of 

.Ext(m). If (x;y) € Ext(m-l), then (y,x) € Ext(m-l) by assumption so 
(3f,/x) €*Ext(m). If Cx^y) i Ext(m-l) , then there is a z such that 
(x,2) 6 Ext(m-l) and (z^y) € Ext(m-l). Then (z,x) and (y,z) are alsg in 
Ext(m-l) by assumption 'so (y,x) € £;xt(ra). 

• Proof ol* (iv): Assum4 (Ijc^y) € Ext(n) and (y,z) e Ext(n). -^y^ii) 
of Definition 6,^(x,z) e Ext(n+l). But Ext(n) = Ext(n+l) by hypothesis^ 
30 (x^^z) € .Ext(n) . ' > ' 

Comment . Theorem I8 showis that once the extensions of R and have 
. achieved the status. of an equivalence relation, there are no further 
additions and all of the succeeding extensions are identical. 

Theorem l8 > .If Ext(n) i£ an equivalence relation then Ext(n) = Ext(m) 
fpy any m > n. ' ^ ' 

. • ' Proof ; The proof i3 by induction on k = m - n. 

If m''- n = 0 then Ext(m) - Ext(n), which is all that is needed. 
Now, assume Ext(m+k) - Ext(n). Then Ext(m+k+l) = Ext(m+k) U ((x,y): 3z 
^ such that (x.z) e Ext(m+k) and (z,y) g E3(it(m+k)l. We will show that / 
' ((x,y): 3z such that (x,z) e Ext(/m+k) and (z,y) € Ext(m+k)] cExt(m+k). 
\ tiCt (x,y) be a. member of ((x,y): 3z sueh that (x,z) € Ext(m+k) and 

(z,y) *€ Ext'(m+k)]. Then there is 6 z such that (oc,z) and (z,y) are in 
, \ Ext(m+k). Ety the J^nductive assumption, (x,z) and (z,y) are in Ext(n)i 

.'t ^ *Since Ext(n) is an equivalence relation, (x,y) e Ext(n) so (x,y) e/^:xt(m+k). 

\ . ■ / 
/ '13 



/ 

Comment . We have shown that the sequence of extensions of i and R' 
has a limit that is an "equivalence relation. This limit is taken to be 

the gufn of R and R'. , 
Definit ion 7 ..' The ".sum otf^ tv6 equivalence relations, H»,and R', 

denoted R© R', is,Bxt(n) for s6me vaO^ ^ iS. £11 

equivalence relation . 

Theorem I9 . For gny R and R' , R ® R' exists and, is unique. 

Proof: This theorem and proof are for R and R' finite. The exis- 
tence of R © R' follows from Theorems 15 and 1?, and the uniqueness . . 
follows from Theorem l8. 

Theorem 20. ® is commvutktlve and associate . 

Proof : Both of these properties follow from the analogous properties- 
for U. 

■ Definition 8 . Two equivalence relation;: R R' are independent 
in S if there are no x,y e C nu£h that x / y ag^ (x,y) e.R and (x,y) e R'. 

Comment. If -R and R' are independent afid x and y are two different 
members of S, then x and y wil. ru.t be equivalent under both R and R'. 

Theorem ^1 . R and H' are independent equivalence relations If . 

only if h rr H' = !• . • . ' ' ' ... . 

Pruuf: ' ■ ' ' - ■ . ■ 
-J . ' _ 

Proof of the forward implication: The proof ia by contradiction 
Assume that R and R' are independent but R n R' / I. We know I e R n R' . 
from Theorem '% oo there io a pair (x,y) that 1. in K n R' but not in , 
I. Since (x,y) 4 I implies that x / y, we have (x,y) e R and (x,y) € R' 
and X / y, which contradictu the a.^umpti-on that H an.i R' are independent. 

The proof of the reverse implication is ijimilar. 



Comment , The condition that R n R' « I is equivaler^t to the defini- 
tion of independence and could have been taken as the definition. 

Definition ^ . Two equivalence relations R and R* are interactive 
in S if there exists x,y e S such that (x,y) € R © R' and (x,y) li R and / 

(x,y) R'. • ■ . ■ / 

Comment , Two equivalence relations are cons l,de red \o be intera/tive 
if the comb^.nation (svim) has the power to equate two elements thayare 

' / . 

not equivalent using either of the relations alone. ^ / 

^Theorem 22 , R and R* are not interactive if and only if 
R © R» ^ R U R*. 

Proof ; * ' ^ ^ 

■ ' Proof of the forward implication: If R and R' are not interactive, 
then there io no pair (x,y) Guch that (x,y) e R ® R' and .(x,y) ^ R U R*. 
Hence, R ® R' c R U R*. We know' from Theorem l6 that R U R* c R © R*. 
Hence, R U R' - R ® H'. . v . 

Proof of the reveroe Implicejition : A£;Gume R and R* are "interactive, 
ffhen there in a paii^ (x,y) ouch that (x,y) c K © K' and (x,y) ^ R U R*. 
• Hence, K © R' / K U^R'. . 

^ Theo^em23, R and h' are nut interactive if and orgy if R U is 
-an equivalence ' rola^iun . 

Proof ; R and R» are not interactive if and only if R © R» R u R' 
by Theorem 22. .By Definiti^on 6, R U R* Ext(l); . Definition 7 -and 
Theorem 15, R © R' ^ Ext(l) if and only J^f Ext(i) is an equivalence 
relation. ^ . 

, Comment . Nfjcejumry and r,ufficiant cr>nditiom; fur nonlnteractivity 
are. either 



or 



(1) ' R ® R* = R U R' 



(2) R U R' is an equivalence relation, ^ 
Either of these conditions- could have/been used to define nohinteraiutivity. 

We turn, now to notion^of the caherence of a classified set: . 
• ^ Definition 10 , Suppose a set ( population ) S.J-S classified by . 
S S , . . . , S and that the . probaMlity of an elemegt being in equivalence 
class S. is Pj, . The coherence .of. the classified set S is defined to be 



7 = 21 P • • 



i=l 

Comment . The value of 7 depends uppn a specified- classification. 
If the equivalence relation defined by the classification is R, we some- 
times use the notation 7^. ^The coherence of a classified set is simply 
• the probability that two elements drawn at random are in the same equiva- 
lence class. Corresponding tq the population parameter 7 there is a . 

jsample statistic c. ■ ^ n > • ' 

Definition 11 > Suppose a finite set S is classified bjr 

S s S and that the co rresponding equivalence relation is R. 

1^ 2^ ^ k — • . ' — [ " 

The sample coherence ' . • 

• _ (N(S))2 ■ 

\ ' , . ^- ^ ■ 

\ ■ ■ ' ■ ■ 

^ Comment. The formula for c can also be written- 



\ 
\ 

\ if 




2- . 2 



i=l 

^ = ^ 



(N(S))' 
^ ^' ,16 



fci vw. 



This lafst fQmul^^lo3^;shows"-i?ipre clearly the: relation between c and 7, 
while the formulation us^d in ttfesdefinition shows that c is the ratio 
•of 'the number of equivalent -pairs 'to t.he number of- possible, pairs. Again 




we use to indicate the dep^i^^eni^ of c i^n the specified equivalence, 

relation. " ■ ^ » 

For research' purposes ) c is not a suitable estimator«of 7 because 

it is biased, and we define an estimator c, which we will. later show to 

be an unbiased estimator"" 'of 7; 

. J • .' ■«> . 

Definition 12. Suppose a set S is classified by S-^^S^^ 

■ I 

and that the corresponding equivalence relation is R. The estimated 



coherence of. S is 



N(R)-N(S) 
^ = N(s5-(fj(s)-l) • 



Comment , c can also be written ^ 



XI (N(S|)))^-N(S) 

1=1 ' 
fy- N(S)-(N(S)-1) 



or 



k 



N(S^).(N(S.)-1) 
.^^ N(S).(N(S)-1)- 



From this ii: can be seen that c is the probability that two elements are 
in the same equivalence class when the drawing is without replacement 
(for c, the drawing is with replacement).' The values ' of c and c are* 
compared in« the following theorems. 

Theorem 2k . c^ =: ^^(^ and c^ = 0. 





of N(I) , ~wher6-l--is the^identity relation, is N(s) , 
so ttt^theorem follows fl^^irect substitution int^^^formulas given. 
Jjx-:the -de^I;^rtlmH;7^ c and c. 
Theorem 25. = 6^ -1. 
proof: For the uni^s^^^ 





Theorem. 26* .c 



' N(S)\N 



1> 



proof: This can be verified by ^i^ct substitution. 
Theorem 27. If R and R' aVe equivalenb^jre.lations and R e then 
c < c Further, R = R' if. and ona.y if c_ =. 



Proof: Sine^e R c R' we have N^R) < N(R'),^ and h^tiqe, 

— - . X. 



N(R) 



< 



N(R') 



(N(S))^ ^ -(^(3))^ 



Since R and R' are defined on the same set S, it follows that < c^, 
Now assume R ^ R' and = c^,. Then N(E) = N( R' ) so R = R' . 
^28. If -R e^R' then c^^ < c^^. 




and 27. 



Theorem 29, Fo 



equivalence >. relation R 




Theorem 30 > The range of c c (0,1]. and the range of c g [0,1]. 

Comment . Range is used here in the mathem^tirrai sense: the set of 
all values that can be assumed by the function c The notation (o,l].is 
used to denote the interval from 0 to 1 excluding 0. but including 1. 
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Similarly, [0,1] includes both'O and 1. From Theorem 29 we see that the 
greatest coherence is attained when the universal relation is used. The 
least coherenc^resultfe from Rising the identity relation, 'under which no 
two diff^ent elements are equivalent. ^ 

■ c 

The"efrejii,,31 . c , > c for any equivalence -relations 
Proof : ^ Since R c R ® R' this follows from Theorem 28. 
Theorem 32. Tf R and R' are not interactive > then 




^R "^'"^R* " %R' ^RTIR* • 




Proc^: For any sets R and R* 

•\n(r) + N(R») := N(RUR') + N(RnR») . 



Since R and R' are not interactive, we have from Theorem 22 that^ 

R ® R' = R U R'.- . 



Hence, 



so 



N(R) + N(H') W(R®R') + N(RnR') 



N(R) N(R') _ N(R®R') _^ N(RnR') _ 

{N{S)f (N(S))2 " Ws)f ms)f ' 



Since R ® R' and R fl R' are equivalent relations, defined on the same set 
as R and R'', we have 




■ °R °R'."'°R®R' °RnR' 
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Theorem 33 > If R ^nd R' are not interactive , . 



Theory 3^ > jf R and R' are independent ^ ^and not interactive , then 



^R "^'^R* ".^R®R' 



Proof: From Theorem 33 we have 



since R and R' are not iliteractive. Since R And R' are independent, 

R n R* = I by Theorem 21. ?y Theorem 2h, c^ = 0, and we have c^^^^, = 0, 

which completes the proof. 

Comment , Under the strong hypotheses that R and R' are ^independent 
and are not interactive in S, Theorem 3k shows that the sum of the 
coherence due to R and R' is the same as the coherence due to the sum 
of R and R'. 

Having shown that c has several characteristics that makep-N^t a 
good measure, we now show that c is also desirable as an estimator of 
y. In particular, we show that c is consistent and unbiased. We assume 
that the sampling is with replacement. » - 

Theorem 3^ .*^ ^ i£ £2. ^^^biased estimator of 7. 

Proof: In this' proof we use the following simplified notation: 



•J^The proofs of this and all subsequent theorems were contributed by 
Mi<^el Kane. 

20 
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N = N(S), 
•n^ = N(S^) 



so that the formula for c "bi^omes 



- fci ^ y 

° - N(N-l) " ^ N(N-l) • 



We must show ihat E(c) = 7. Observe that for i = 1,2, E, the random 

variable is binomial with parameters and N, so that 

E(n^) = Np^ and var(n^) = Np^(l-p^) . 



Thus 



so 



1 

E(n^) = var(^) + E^(n^) = N(N-l)p^ + Np^ 



2 

n -n. 



E( c) = E I 



k /„/ 2 
i=l 



(E( np-E(n^)) 
N(N-l) 



^ • (N(N-l)p^+4^-Np^) j) 



N(N-l) 



k 

i=l 



Pi 
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/ 



Definition JQ '. Let f ( ^ a polynomial funct 
f(N) = 0(n") iff W highest power ol N in f(N) is less than equa 

. Comment . ThlTotation i^ifro^^d in Definition 13 is use 
examining the asWptotic properties of - funa^ions. In the deveiopment 
that follows, we are intere-Bted in the limiting values of .polynomials 



1 to a. 



in 



N, as N approaches 
faster than as 



infinity. Since f(.N-) - O(I^), f(N) does no-« grov 



This implies, that /iiin, O(N^) = 



for p > a. 



al 



Lemmg ; L^zin^) be a set of random vai/iables vith a multo^nomi 
disfribuj^dToii . Let* a and p be integer constants and let n^ and n^ (i 

y^ibly equal to j) be an^^ two random variables from the set 
for any sample si:^e j N, 



Then, 



Proof: The proof is by induction on a and p. For a =^ 1^, P = 0, 
we have frbm'(2) in Theorem 35, 

. . E(n^) - Np^ 

Np^ + b(N°) ^ 

- Np^ + 0(1) - 

Similarly, the lemma holdc for a = 0, p ^ 1. Yox/O. & --j ^, 




2?. 



Z^7 




^(N-1) PiPj H n^!...(n^-lS!"!l(nj-l)!...\! 

Pi •••Pi •••Pj •••Pk 

= N(N-l)p^Pj 

and since p p is a constant for any population^ 

E(n^nj) = N^P^Pj + 0(N) . 

Assvune that the lenuna is true for a and p less than 0. For P < 9, 



E(n^n^) n^.'...n^i Pi •'•Pk 

„ V 0-1 P (N-l)i 
= Np^2_"i ?J n^!...(r^-l)r 



"l "i-^ \ 
J '"'^± " '\ 



,0-1 



the sxunmation above is simply the E(n^" np for a sample size of N-1, 
and 0-1 and-* p aice both less than 0. 



E(n, 



nP) = Np^[(N-l)^-^-^P pJ-S5-.0(N^-^P-2)l 
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/Now 

So the l|nima holds for a = 0, P < 0. Similar treatments vould show that 
the lemma also holds for p = 0, a < 0 and for a = p = 0. Therefore, the 
lemma has been proved. 

Theorem 36 . c is a consistent estimator . 
' Proof : Since c is an unbiased estimator of 7, we need only sliow 
that var (c) goes to zero as N -> 00 in order to establish the consistency 
of c. 

var (c) = E(S^) - [E(c)]^ ' (D 




2) 



/ 




E 



i 



Using the lemma, 



E 



= N 



= N 



' + o(n3) . , 



Inserting this in equation 2, we have 



N (N-1) 

"and inserting this in equation 1, we have 



var (c) 



/(i:p?j^o(»3,].[„^(i: 



O(N^) 



ERIC 



25 



' lim' var (c)' t' iim 
.''■N-»-oo' ■ .• • • K-»'0& 



= 0 .■. 



* .. Comment. A concept closely related to coherence is diversity. 

Dejfinltldn 3^ . Silppose a set JS is classifted by S^^S^, • • • ; 
' ^ . ■• •' ^ ' ' ' / 

a^d that tM probabllTt^^ thfit an element from S is in is 



The 



diversify of the Qlassif4.ed* set'^ S,iJ 



8= i^-fr' 




Comment, 8 can also be^vyltten > 



8 = 1 



21 Pi , » 



=i=l 

«,■■:" . " ' 

and the related sampl^ d-latistic is defined below. 

' ■ , - 'to • * ' 

Definition V) . Suppose-, 4* finite set S is classified by Sj^,S^>..., 

■ • • * • . > 

and that the corresponding equivalence relation 'is. R, - ^Th^a the sample 
diversity of S is ' ^ ^_ 

d ^- 1 - c . . ' 

. Definition I6 . _ Suppose a finite, set S is classified ^ 8^,82, • • • , 
and that the corresponding equivalence relation is R. ,The estima?t;ed 
diversity of 5 is , • • 

d = '1 - c 

Comment. Most of the properties, of the coherence measures 7, c, 
and c have simple analogies for/ 8, d,, 'and d. The exceptions -are ^?:fie - . 
•additivity' theorems, Theorems 33 and 3^- Just as c is a consistent, / 
vinbiased estimator of coherence, so is d a consistent, unbiased festimatcMT 
of 5: , 26 • • ' ' 



,Theo£em37. 9 i£ unbiased, consistent estimator of 5. 
\ Proof: - . 



/ . 



' - d = 1 - c 

:/ E(d) = 1 - E(c) 

' • * = 1 7 

= B . 



Therefore d is unbiased.. 



var (d)' = var (1-c) 
= var (c) 

lim var (d) = llm var (c) = 0 . 

N-^ OD N-> CD 



Therefore d is coi:isistent. 

/ 
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